Re: choosing placement upon RESTORE

2017-05-02 Thread xavier jmlucjav
thanks Mikhail, that sounds like it would help me as it allows you to set
createNodeSet on RESTORE calls

On Tue, May 2, 2017 at 2:50 PM, Mikhail Khludnev <m...@apache.org> wrote:

> This sounds relevant, but different to https://issues.apache.org/
> jira/browse/SOLR-9527
> You may want to follow this ticket.
>
> On Mon, May 1, 2017 at 9:15 PM, xavier jmlucjav <jmluc...@gmail.com>
> wrote:
>
>> hi,
>>
>> I am facing this situation:
>> - I have a 3 node Solr 6.1 with some 1 shard, 1 node collections (it's
>> just
>> for dev work)
>> - the collections where created with:
>>action=CREATE&...=EMPTY"
>> then
>>   action=ADDREPLICA&...=$NODEA=$DATADIR"
>> - I have taken a BACKUP of the collections
>> - Solr is upgraded to 6.5.1
>>
>> Now, I started using RESTORE to restore the collections on the node A
>> (where they lived before), but, instead of all being created in node A,
>> collections have been created in A, then B, then C nodes. Well, Solrcloud
>> tried to, as 2nd and 3rd RESTOREs failed, as the backup was in node A's
>> disk, not reachable from nodes B and C.
>>
>> How is this supposed to work? I am looking at Rule Based Placement but it
>> seems it is only available for CREATESHARD, so I can use it in RESTORE?
>> Isn't there a way to force Solrcloud to create the collection in a given
>> node?
>>
>> thanks!
>>
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


choosing placement upon RESTORE

2017-05-01 Thread xavier jmlucjav
hi,

I am facing this situation:
- I have a 3 node Solr 6.1 with some 1 shard, 1 node collections (it's just
for dev work)
- the collections where created with:
   action=CREATE&...=EMPTY"
then
  action=ADDREPLICA&...=$NODEA=$DATADIR"
- I have taken a BACKUP of the collections
- Solr is upgraded to 6.5.1

Now, I started using RESTORE to restore the collections on the node A
(where they lived before), but, instead of all being created in node A,
collections have been created in A, then B, then C nodes. Well, Solrcloud
tried to, as 2nd and 3rd RESTOREs failed, as the backup was in node A's
disk, not reachable from nodes B and C.

How is this supposed to work? I am looking at Rule Based Placement but it
seems it is only available for CREATESHARD, so I can use it in RESTORE?
Isn't there a way to force Solrcloud to create the collection in a given
node?

thanks!


DIH: last_index_time not updated on if 0 docs updated

2017-02-27 Thread xavier jmlucjav
Hi,

After getting our interval for calling delta index shorter and shorter, I
have found out that last_index_time  in dataimport.properties is not
updated every time the indexing runs, it is skipped if no docs where added.

This happens at least in the following scenario:
- running delta as full index
( /dataimport?command=full-import=false=true )
- Solrcloud setup, so dataimport.properties is in zookeeper
- Solr 5.5.0

I understand skipping the commit on the index if no docs were updated is a
nice optimization, but I believe the last_index_time info should be updated
in all cases, so it reflects reality. We, for instance, are looking at this
piece of information in order to do other stuff.

I could not find any mention of this on Jira, so I wonder if this is
intented or just nobody had an issue with it?

xavier


Re: procedure to restart solrcloud, and config/collection consistency

2017-02-09 Thread xavier jmlucjav
hi Shawn,

as I replied to Markus, of course I know (and use) the collections api to
reload the config. I am asking what would happen in that scenario:
 - config updated (but collection not reloaded)
 - i restart one node
now one node has the new config and the rest the old one??

To which he already replied:
>The restared/reloaded node has the new config, the others have the old
config until reloaded/restarted.

I was not asking about making solr restart itself, my English must be worst
than I thought. By the way, stuff like that can be achieved with
http://yajsw.sourceforge.net/ a very powerful java wrapper, I used to use
it when Solr did not have a built in daemon setup. It was built by someone
how was using JSW, and got pissed when that one went commercial. It is very
configurable, but of course more complex. I wrote something about it some
time ago
https://medium.com/@jmlucjav/how-to-install-solr-as-a-service-in-any-platform-including-solr-5-8e4a93cc3909

thanks

On Thu, Feb 9, 2017 at 4:53 PM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 2/9/2017 5:24 AM, xavier jmlucjav wrote:
> > I always wondered, if this was not really needed, and I could just call
> > 'restart' in every node, in a quick loop, and forget about it. Does
> anyone
> > know if this is the case?
> >
> > My doubt is in regards to changing some config, and then doing the above
> > (just restart nodes in a loop). For example, what if I change a config G
> > used in collection C, and I restart just one of the nodes (N1), leaving
> the rest alone. If all the nodes contain a shard for C, what happens, N1 is
> using the new config and the rest are not? how is this handled?
>
> If you want to change the config or schema for a collection and make it
> active across all nodes, just use the collections API to RELOAD the
> collection.  The change will be picked up everywhere.
>
> https://cwiki.apache.org/confluence/display/solr/Collections+API
>
> To answer your question: No.  Solr does not have the ability to restart
> itself.  It would require significant development effort and a
> fundamental change in how Solr is started to make it possible.  It is
> something that has been discussed, but at this time it is not possible.
>
> One idea that would make this possible is mentioned on the following
> wiki page.  It talks about turning Solr into two applications instead of
> one:
>
> https://wiki.apache.org/solr/WhyNoWar#Information_that.27s_
> not_version_specific
>
> Again -- it would not be easy, which is why it hasn't been done yet.
>
> Thanks,
> Shawn
>
>


Re: procedure to restart solrcloud, and config/collection consistency

2017-02-09 Thread xavier jmlucjav
Hi Markus,

yes, of course I know (and use) the collections api to reload the config. I
am asking what would happen in that scenario:
- config updated (but collection not reloaded)
- i restart one node

now one node has the new config and the rest the old one??

Regarding restarting many hosts, my question is if we can just 'restart'
each solr and that is enough, or it is better to first stop all, and then
start all.

thanks


On Thu, Feb 9, 2017 at 1:28 PM, Markus Jelsma 
wrote:

> Hello - if you just want to use updated configuration, you can use Solr's
> collection reload API call. For restarting we rely on remote provisioning
> tools such as Salt, other managing tools can probably execute commands
> remotely as well.
>
> If you operate more than just a very few machines, i'd really recommend
> using these tools.
>
> Markus
>
>
>
> -Original message-
> > From:xavier jmlucjav 
> > Sent: Thursday 9th February 2017 13:24
> > To: solr-user 
> > Subject: procedure to restart solrcloud, and config/collection
> consistency
> >
> > Hi,
> >
> > When I need to restart a Solrcloud cluster, I always do this:
> > - log in into host nb1, stop solr
> > - log in into host nb2, stop solr
> > -...
> > - log in into host nbX, stop solr
> > - verify all hosts did stop
> > - in host nb1, start solr
> > - in host nb12, start solr
> > -...
> >
> > I always wondered, if this was not really needed, and I could just call
> > 'restart' in every node, in a quick loop, and forget about it. Does
> anyone
> > know if this is the case?
> >
> > My doubt is in regards to changing some config, and then doing the above
> > (just restart nodes in a loop). For example, what if I change a config G
> > used in collection C, and I restart just one of the nodes (N1), leaving
> the
> > rest alone. If all the nodes contain a shard for C, what happens, N1 is
> > using the new config and the rest are not? how is this handled?
> >
> > thanks
> > xavier
> >
>


procedure to restart solrcloud, and config/collection consistency

2017-02-09 Thread xavier jmlucjav
Hi,

When I need to restart a Solrcloud cluster, I always do this:
- log in into host nb1, stop solr
- log in into host nb2, stop solr
-...
- log in into host nbX, stop solr
- verify all hosts did stop
- in host nb1, start solr
- in host nb12, start solr
-...

I always wondered, if this was not really needed, and I could just call
'restart' in every node, in a quick loop, and forget about it. Does anyone
know if this is the case?

My doubt is in regards to changing some config, and then doing the above
(just restart nodes in a loop). For example, what if I change a config G
used in collection C, and I restart just one of the nodes (N1), leaving the
rest alone. If all the nodes contain a shard for C, what happens, N1 is
using the new config and the rest are not? how is this handled?

thanks
xavier


reuse a org.apache.lucene.search.Query in Solrj?

2017-01-05 Thread xavier jmlucjav
Hi,

I have a lucene Query (Boolean query with a bunch of possibly complex
spatial queries, even polygon etc) that I am building for some MemoryIndex
stuff.

Now I need to add that same query to a Solr query (adding it to a bunch of
other fq I am using). Is there a some way to piggyback the lucene query
this way?? It would be extremelly handy in my situation.

thanks
xavier


solrj: get to which shard a id will be routed

2016-12-22 Thread xavier jmlucjav
Hi

Is there somewhere a sample of some solrj code that given:
- a collection
- the id (like "IBM!12345")

returns the shard to where the doc will be routed? I was hoping to get that
info from CloudSolrClient  itself but it's not exposing it as far as I can
see.

thanks
xavier


Re: 'solr zk upconfig' etc not working on windows since 6.1 at least

2016-10-27 Thread xavier jmlucjav
done, with simple patch https://issues.apache.org/jira/browse/SOLR-9697

On Thu, Oct 27, 2016 at 4:21 PM, xavier jmlucjav <jmluc...@gmail.com> wrote:

> sure, will do, I tried before but I could not create a Jira, now I can,
> not sure what was happening.
>
> On Thu, Oct 27, 2016 at 3:14 PM, Shalin Shekhar Mangar <
> shalinman...@gmail.com> wrote:
>
>> Would you mind opening a jira issue and give a patch (diff)? 6.3 is coming
>> out soon and we'd have to hurry if this fix has to go in.
>>
>> On Thu, Oct 27, 2016 at 6:32 PM, xavier jmlucjav <jmluc...@gmail.com>
>> wrote:
>>
>> > Correcting myself here, I was wrong about the cause (I had already
>> messed
>> > with the script).
>> >
>> > I made it work by commenting out line 1261 (the number might be a bit
>> off
>> > as I have modified the script, but hopefully its easy to see where):
>> >
>> > ) ELSE IF "%1"=="/?" (
>> >   goto zk_usage
>> > ) ELSE IF "%1"=="-h" (
>> >   goto zk_usage
>> > ) ELSE IF "%1"=="-help" (
>> >   goto zk_usage
>> > ) ELSE IF "!ZK_SRC!"=="" (
>> >   if not "%~1"=="" (
>> > goto set_zk_src
>> >   )
>> >  * rem goto zk_usage*
>> > ) ELSE IF "!ZK_DST!"=="" (
>> >   IF "%ZK_OP%"=="cp" (
>> > goto set_zk_dst
>> >   )
>> >   IF "%ZK_OP%"=="mv" (
>> > goto set_zk_dst
>> >   )
>> >   set ZK_DST="_"
>> > ) ELSE IF NOT "%1"=="" (
>> >   set ERROR_MSG="Unrecognized or misplaced zk argument %1%"
>> >
>> > Now upconfig works!
>> >
>> > thanks
>> > xavier
>> >
>> >
>> > On Thu, Oct 27, 2016 at 2:43 PM, xavier jmlucjav <jmluc...@gmail.com>
>> > wrote:
>> >
>> > > hi,
>> > >
>> > > Am I missing something or this is broken in windows? I cannot
>> upconfig,
>> > > the scripts keeps exiting immediately and showing usage, as if I use
>> some
>> > > wrong parameters.  This is on win10, jdk8. But I am pretty sure I saw
>> it
>> > > also on win7 (don't have that around anymore to try)
>> > >
>> > > I think the issue is: there is a SHIFT too much in line 1276 of
>> solr.cmd:
>> > >
>> > > :set_zk_op
>> > > set ZK_OP=%~1
>> > > SHIFT
>> > > goto parse_zk_args
>> > >
>> > > if this SHIFT is removed, then parse_zk_args works (and it does the
>> shift
>> > > itself). But the upconfig hangs, so still it does not work.
>> > >
>> > > this probably was introduced in a851d5f557aefd76c01ac23da076a1
>> 4dc7576d8e
>> > > by Erick (not sure which one :) ) on July 2nd. Master still has this
>> > issue.
>> > > Would be great if this was fixed in the incoming 6.3...
>> > >
>> > > My cmd scripting is not too strong and I did not go further. I
>> searched
>> > > Jira but found nothing. By the way is it not possible to open tickets
>> in
>> > > Jira anymore?
>> > >
>> > > xavier
>> > >
>> >
>>
>>
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>
>
>


Re: 'solr zk upconfig' etc not working on windows since 6.1 at least

2016-10-27 Thread xavier jmlucjav
sure, will do, I tried before but I could not create a Jira, now I can, not
sure what was happening.

On Thu, Oct 27, 2016 at 3:14 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> Would you mind opening a jira issue and give a patch (diff)? 6.3 is coming
> out soon and we'd have to hurry if this fix has to go in.
>
> On Thu, Oct 27, 2016 at 6:32 PM, xavier jmlucjav <jmluc...@gmail.com>
> wrote:
>
> > Correcting myself here, I was wrong about the cause (I had already messed
> > with the script).
> >
> > I made it work by commenting out line 1261 (the number might be a bit off
> > as I have modified the script, but hopefully its easy to see where):
> >
> > ) ELSE IF "%1"=="/?" (
> >   goto zk_usage
> > ) ELSE IF "%1"=="-h" (
> >   goto zk_usage
> > ) ELSE IF "%1"=="-help" (
> >   goto zk_usage
> > ) ELSE IF "!ZK_SRC!"=="" (
> >   if not "%~1"=="" (
> > goto set_zk_src
> >   )
> >  * rem goto zk_usage*
> > ) ELSE IF "!ZK_DST!"=="" (
> >   IF "%ZK_OP%"=="cp" (
> > goto set_zk_dst
> >   )
> >   IF "%ZK_OP%"=="mv" (
> > goto set_zk_dst
> >   )
> >   set ZK_DST="_"
> > ) ELSE IF NOT "%1"=="" (
> >   set ERROR_MSG="Unrecognized or misplaced zk argument %1%"
> >
> > Now upconfig works!
> >
> > thanks
> > xavier
> >
> >
> > On Thu, Oct 27, 2016 at 2:43 PM, xavier jmlucjav <jmluc...@gmail.com>
> > wrote:
> >
> > > hi,
> > >
> > > Am I missing something or this is broken in windows? I cannot upconfig,
> > > the scripts keeps exiting immediately and showing usage, as if I use
> some
> > > wrong parameters.  This is on win10, jdk8. But I am pretty sure I saw
> it
> > > also on win7 (don't have that around anymore to try)
> > >
> > > I think the issue is: there is a SHIFT too much in line 1276 of
> solr.cmd:
> > >
> > > :set_zk_op
> > > set ZK_OP=%~1
> > > SHIFT
> > > goto parse_zk_args
> > >
> > > if this SHIFT is removed, then parse_zk_args works (and it does the
> shift
> > > itself). But the upconfig hangs, so still it does not work.
> > >
> > > this probably was introduced in a851d5f557aefd76c01ac23da076a1
> 4dc7576d8e
> > > by Erick (not sure which one :) ) on July 2nd. Master still has this
> > issue.
> > > Would be great if this was fixed in the incoming 6.3...
> > >
> > > My cmd scripting is not too strong and I did not go further. I searched
> > > Jira but found nothing. By the way is it not possible to open tickets
> in
> > > Jira anymore?
> > >
> > > xavier
> > >
> >
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


Re: 'solr zk upconfig' etc not working on windows since 6.1 at least

2016-10-27 Thread xavier jmlucjav
Correcting myself here, I was wrong about the cause (I had already messed
with the script).

I made it work by commenting out line 1261 (the number might be a bit off
as I have modified the script, but hopefully its easy to see where):

) ELSE IF "%1"=="/?" (
  goto zk_usage
) ELSE IF "%1"=="-h" (
  goto zk_usage
) ELSE IF "%1"=="-help" (
  goto zk_usage
) ELSE IF "!ZK_SRC!"=="" (
  if not "%~1"=="" (
goto set_zk_src
  )
 * rem goto zk_usage*
) ELSE IF "!ZK_DST!"=="" (
  IF "%ZK_OP%"=="cp" (
goto set_zk_dst
  )
  IF "%ZK_OP%"=="mv" (
goto set_zk_dst
  )
  set ZK_DST="_"
) ELSE IF NOT "%1"=="" (
  set ERROR_MSG="Unrecognized or misplaced zk argument %1%"

Now upconfig works!

thanks
xavier


On Thu, Oct 27, 2016 at 2:43 PM, xavier jmlucjav <jmluc...@gmail.com> wrote:

> hi,
>
> Am I missing something or this is broken in windows? I cannot upconfig,
> the scripts keeps exiting immediately and showing usage, as if I use some
> wrong parameters.  This is on win10, jdk8. But I am pretty sure I saw it
> also on win7 (don't have that around anymore to try)
>
> I think the issue is: there is a SHIFT too much in line 1276 of solr.cmd:
>
> :set_zk_op
> set ZK_OP=%~1
> SHIFT
> goto parse_zk_args
>
> if this SHIFT is removed, then parse_zk_args works (and it does the shift
> itself). But the upconfig hangs, so still it does not work.
>
> this probably was introduced in a851d5f557aefd76c01ac23da076a14dc7576d8e
> by Erick (not sure which one :) ) on July 2nd. Master still has this issue.
> Would be great if this was fixed in the incoming 6.3...
>
> My cmd scripting is not too strong and I did not go further. I searched
> Jira but found nothing. By the way is it not possible to open tickets in
> Jira anymore?
>
> xavier
>


'solr zk upconfig' etc not working on windows since 6.1 at least

2016-10-27 Thread xavier jmlucjav
hi,

Am I missing something or this is broken in windows? I cannot upconfig, the
scripts keeps exiting immediately and showing usage, as if I use some wrong
parameters.  This is on win10, jdk8. But I am pretty sure I saw it also on
win7 (don't have that around anymore to try)

I think the issue is: there is a SHIFT too much in line 1276 of solr.cmd:

:set_zk_op
set ZK_OP=%~1
SHIFT
goto parse_zk_args

if this SHIFT is removed, then parse_zk_args works (and it does the shift
itself). But the upconfig hangs, so still it does not work.

this probably was introduced in a851d5f557aefd76c01ac23da076a14dc7576d8e by
Erick (not sure which one :) ) on July 2nd. Master still has this issue.
Would be great if this was fixed in the incoming 6.3...

My cmd scripting is not too strong and I did not go further. I searched
Jira but found nothing. By the way is it not possible to open tickets in
Jira anymore?

xavier


Re: JNDI settings

2016-09-26 Thread xavier jmlucjav
I did set up JNDI for DIH once, and you have to tweak the jetty setup. Of
course, solr should have its own jetty instance, the old way of being just
a war is not true anymore. I don't remember where, but there should be some
instructions somewhere, it took me an afternoon to set it up fine.

xavier

On Wed, Sep 21, 2016 at 1:15 PM, Aristedes Maniatis 
wrote:

> On 13/09/2016 1:29am, Aristedes Maniatis wrote:
> > I am using Solr 5.5 and wanting to add JNDI settings to Solr (for data
> import). I'm new to Solr Cloud setup (previously I was running Solr running
> as a custom bundled war) so I can't figure where to put the JNDI settings
> with user/pass themselves.
> >
> > I don't want to add it to jetty.xml because that's part of the packaged
> application which will be upgraded from time to time.
> >
> > Should it go into solr.xml inside the solr.home directory? If so, what's
> the right syntax there?
>
>
> Just a follow up on this question. Does anyone know of how I can add JNDI
> settings to Solr without overwriting parts of the application itself?
>
> Cheers
> Ari
>
>
>
> --
> -->
> Aristedes Maniatis
> GPG fingerprint CBFB 84B4 738D 4E87 5E5C  5EFA EF6A 7D2E 3E49 102A
>


[ANN] sadat: generate fake docs for your Solr index

2014-03-17 Thread xavier jmlucjav
Hi,

A couple of times I found myself in the following situation: I had to work
on a Solr schema, but had no docs to index yet (the db was not ready etc).

In order to start learning js, I needed some small project to practice, so
I thought of this small utility. It allows you to generate fake docs to
index, so you can at least advance with the schema/solrconfig design.

Currently it allows (based on your current schema) to generate the most
basic field types (int, float, boolean, text, date), and user defined
functions can be plugged in for customized generation.

Have a look at https://github.com/jmlucjav/sadat


Re: When is/should qf different from pf?

2013-10-29 Thread xavier jmlucjav
I am confused, wouldn't a doc that match both the phrase and the term
queries have a better score than a doc matching only the term score, even
if qf and pf are the same??


On Mon, Oct 28, 2013 at 7:54 PM, Upayavira u...@odoko.co.uk wrote:

 There'd be no point having them the same.

 You're likely to include boosts in your pf, so that docs that match the
 phrase query as well as the term query score higher than those that just
 match the term query.

 Such as:

   qf=text descriptionpf=text^2 description^4

 Upayavira

 On Mon, Oct 28, 2013, at 05:44 PM, Amit Nithian wrote:
  Thanks Erick. Numeric fields make sense as I guess would strictly string
  fields too since its one  term? In the normal text searching case though
  does it make sense to have qf and pf differ?
 
  Thanks
  Amit
  On Oct 28, 2013 3:36 AM, Erick Erickson erickerick...@gmail.com
  wrote:
 
   The facetious answer is when phrases aren't important in the fields.
   If you're doing a simple boolean match, adding phrase fields will add
   expense, to no good purpose etc. Phrases on numeric
   fields seems wrong.
  
   FWIW,
   Erick
  
  
   On Mon, Oct 28, 2013 at 1:03 AM, Amit Nithian anith...@gmail.com
 wrote:
  
Hi all,
   
I have been using Solr for years but never really stopped to wonder:
   
When using the dismax/edismax handler, when do you have the qf
 different
from the pf?
   
I have always set them to be the same (maybe different weights) but
 I was
wondering if there is a situation where you would have a field in
 the qf
not in the pf or vice versa.
   
My understanding from the docs is that qf is a term-wise hard filter
   while
pf is a phrase-wise boost of documents who made it past the qf
 filter.
   
Thanks!
Amit
   
  



Re: do SearchComponents have access to response contents

2013-04-05 Thread xavier jmlucjav
I knew I could do that at jetty level with a servlet for instance, but the
user wants to do this stuff inside solr code itself. Now that you mention
the logs...that could be a solution without modifying the webapp...

thanks for the input!
xavier


On Fri, Apr 5, 2013 at 7:55 AM, Amit Nithian anith...@gmail.com wrote:

 We need to also track the size of the response (as the size in bytes of
 the
 whole xml response tat is streamed, with stored fields and all). I was a
 bit worried cause I am wondering if a searchcomponent will actually have
 access to the response bytes...

 == Can't you get this from your container access logs after the fact? I
 may be misunderstanding something but why wouldn't mining the Jetty/Tomcat
 logs for the response size here suffice?

 Thanks!
 Amit


 On Thu, Apr 4, 2013 at 1:34 AM, xavier jmlucjav jmluc...@gmail.com
 wrote:

  A custom QueryResponseWriter...this makes sense, thanks Jack
 
 
  On Wed, Apr 3, 2013 at 11:21 PM, Jack Krupansky j...@basetechnology.com
  wrote:
 
   The search components can see the response as a namedlist, but it is
   only when SolrDispatchFIlter calls the QueryResponseWriter that XML or
  JSON
   or whatever other format (Javabin as well) is generated from the named
  list
   for final output in an HTTP response.
  
   You probably want a custom query response writer that wraps the XML
   response writer. Then you can generate the XML and then do whatever you
   want with it.
  
   The QueryResponseWriter class and queryResponseWriter in
  solrconfig.xml.
  
   -- Jack Krupansky
  
   -Original Message- From: xavier jmlucjav
   Sent: Wednesday, April 03, 2013 4:22 PM
   To: solr-user@lucene.apache.org
   Subject: do SearchComponents have access to response contents
  
  
   I need to implement some SearchComponent that will deal with metrics on
  the
   response. Some things I see will be easy to get, like number of hits
 for
   instance, but I am more worried with this:
  
   We need to also track the size of the response (as the size in bytes of
  the
   whole xml response tat is streamed, with stored fields and all). I was
 a
   bit worried cause I am wondering if a searchcomponent will actually
 have
   access to the response bytes...
  
   Can someone confirm one way or the other? We are targeting Sorl4.0
  
   thanks
   xavier
  
 



Re: do SearchComponents have access to response contents

2013-04-04 Thread xavier jmlucjav
A custom QueryResponseWriter...this makes sense, thanks Jack


On Wed, Apr 3, 2013 at 11:21 PM, Jack Krupansky j...@basetechnology.comwrote:

 The search components can see the response as a namedlist, but it is
 only when SolrDispatchFIlter calls the QueryResponseWriter that XML or JSON
 or whatever other format (Javabin as well) is generated from the named list
 for final output in an HTTP response.

 You probably want a custom query response writer that wraps the XML
 response writer. Then you can generate the XML and then do whatever you
 want with it.

 The QueryResponseWriter class and queryResponseWriter in solrconfig.xml.

 -- Jack Krupansky

 -Original Message- From: xavier jmlucjav
 Sent: Wednesday, April 03, 2013 4:22 PM
 To: solr-user@lucene.apache.org
 Subject: do SearchComponents have access to response contents


 I need to implement some SearchComponent that will deal with metrics on the
 response. Some things I see will be easy to get, like number of hits for
 instance, but I am more worried with this:

 We need to also track the size of the response (as the size in bytes of the
 whole xml response tat is streamed, with stored fields and all). I was a
 bit worried cause I am wondering if a searchcomponent will actually have
 access to the response bytes...

 Can someone confirm one way or the other? We are targeting Sorl4.0

 thanks
 xavier



do SearchComponents have access to response contents

2013-04-03 Thread xavier jmlucjav
I need to implement some SearchComponent that will deal with metrics on the
response. Some things I see will be easy to get, like number of hits for
instance, but I am more worried with this:

We need to also track the size of the response (as the size in bytes of the
whole xml response tat is streamed, with stored fields and all). I was a
bit worried cause I am wondering if a searchcomponent will actually have
access to the response bytes...

Can someone confirm one way or the other? We are targeting Sorl4.0

thanks
xavier


custom similary on a field not working

2013-03-21 Thread xavier jmlucjav
I have the following setup:

fieldType name=text class=solr.TextField
positionIncrementGap=100
analyzer
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
/analyzer
/fieldType
field name=descriptiontype=text   indexed=true
stored=true   multiValued=false omitNorms=true /

I index my corpus, and I can see tf is as usual, in this doc is 14 times in
this field:
4.5094776 = (MATCH) weight(description:galaxy^10.0 in 440)
[DefaultSimilarity], result of:
  4.5094776 = score(doc=440,freq=14.0 = termFreq=14.0), product of:
0.14165252 = queryWeight, product of:
  10.0 = boost
  8.5082035 = idf(docFreq=30, maxDocs=56511)
  0.0016648936 = queryNorm
31.834784 = fieldWeight in 440, product of:
  3.7416575 = tf(freq=14.0), with freq of:
14.0 = termFreq=14.0
  8.5082035 = idf(docFreq=30, maxDocs=56511)
  1.0 = fieldNorm(doc=440)


Then I modify my schema:

similarity class=solr.SchemaSimilarityFactory/
fieldType name=text class=solr.TextField
positionIncrementGap=100
analyzer
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
/analyzer
similarity class=com.customsolr.NoTfSimilarityFactory/
/fieldType

I just want to disable term freq  1, so a term its either present or not.

public class NoTfSimilarity extends DefaultSimilarity {
public float tf(float freq) {
return freq  0 ? 1.0f : 0.0f;
}
}

But I still see tf=14 in my query??
723.89526 = (MATCH) weight(description:galaxy^10.0 in 440) [], result of:
723.89526 = score(doc=440,freq=14.0 = termFreq=14.0), product of:
  85.08203 = queryWeight, product of:
10.0 = boost
8.5082035 = idf(docFreq=30, maxDocs=56511)
1.0 = queryNorm
  8.5082035 = fieldWeight in 440, product of:
1.0 = tf(freq=14.0), with freq of:
  14.0 = termFreq=14.0
8.5082035 = idf(docFreq=30, maxDocs=56511)
1.0 = fieldNorm(doc=440)

anyone sees what I am missing?
I am on solr4.0

thanks
xavier


Re: custom similary on a field not working

2013-03-21 Thread xavier jmlucjav
Hi Felipe,

I need to keep positions, that is why I cannot just use
omitTermFreqAndPositions


On Thu, Mar 21, 2013 at 2:36 PM, Felipe Lahti fla...@thoughtworks.comwrote:

 Do you really need a custom similarity?
 Did you try to put the attribute omitTermFreqAndPositions in your field?

 It could be:

 field name=description omitTermFreqAndPositions=truetype=text
 indexed=true stored=true  multiValued=false omitNorms=true /

 http://wiki.apache.org/solr/SchemaXml


 On Thu, Mar 21, 2013 at 7:35 AM, xavier jmlucjav jmluc...@gmail.com
 wrote:

  I have the following setup:
 
  fieldType name=text class=solr.TextField
  positionIncrementGap=100
  analyzer
  tokenizer class=solr.StandardTokenizerFactory/
  filter class=solr.LowerCaseFilterFactory/
  /analyzer
  /fieldType
  field name=descriptiontype=text   indexed=true
  stored=true   multiValued=false omitNorms=true /
 
  I index my corpus, and I can see tf is as usual, in this doc is 14 times
 in
  this field:
  4.5094776 = (MATCH) weight(description:galaxy^10.0 in 440)
  [DefaultSimilarity], result of:
4.5094776 = score(doc=440,freq=14.0 = termFreq=14.0), product of:
  0.14165252 = queryWeight, product of:
10.0 = boost
8.5082035 = idf(docFreq=30, maxDocs=56511)
0.0016648936 = queryNorm
  31.834784 = fieldWeight in 440, product of:
3.7416575 = tf(freq=14.0), with freq of:
  14.0 = termFreq=14.0
8.5082035 = idf(docFreq=30, maxDocs=56511)
1.0 = fieldNorm(doc=440)
 
 
  Then I modify my schema:
 
  similarity class=solr.SchemaSimilarityFactory/
  fieldType name=text class=solr.TextField
  positionIncrementGap=100
  analyzer
  tokenizer class=solr.StandardTokenizerFactory/
  filter class=solr.LowerCaseFilterFactory/
  /analyzer
  similarity class=com.customsolr.NoTfSimilarityFactory/
  /fieldType
 
  I just want to disable term freq  1, so a term its either present or
 not.
 
  public class NoTfSimilarity extends DefaultSimilarity {
  public float tf(float freq) {
  return freq  0 ? 1.0f : 0.0f;
  }
  }
 
  But I still see tf=14 in my query??
  723.89526 = (MATCH) weight(description:galaxy^10.0 in 440) [], result of:
  723.89526 = score(doc=440,freq=14.0 = termFreq=14.0), product of:
85.08203 = queryWeight, product of:
  10.0 = boost
  8.5082035 = idf(docFreq=30, maxDocs=56511)
  1.0 = queryNorm
8.5082035 = fieldWeight in 440, product of:
  1.0 = tf(freq=14.0), with freq of:
14.0 = termFreq=14.0
  8.5082035 = idf(docFreq=30, maxDocs=56511)
  1.0 = fieldNorm(doc=440)
 
  anyone sees what I am missing?
  I am on solr4.0
 
  thanks
  xavier
 



 --
 Felipe Lahti
 Consultant Developer - ThoughtWorks Porto Alegre



Re: custom similary on a field not working

2013-03-21 Thread xavier jmlucjav
Steve,

yes, as I already included (though maybe is not very visible) I have this
before types element:
similarity class=solr.SchemaSimilarityFactory/

I can see explain info is indeed different, for example I have [] instead
of [DefaultSimilarity]

thanks



On Thu, Mar 21, 2013 at 3:08 PM, Steve Rowe sar...@gmail.com wrote:

 Hi xavier,

 Have you set the global similarity to solr.SchemaSimilarityFactory?

 See http://wiki.apache.org/solr/SchemaXml#Similarity.

 Steve

 On Mar 21, 2013, at 9:44 AM, xavier jmlucjav jmluc...@gmail.com wrote:

  Hi Felipe,
 
  I need to keep positions, that is why I cannot just use
  omitTermFreqAndPositions
 
 
  On Thu, Mar 21, 2013 at 2:36 PM, Felipe Lahti fla...@thoughtworks.com
 wrote:
 
  Do you really need a custom similarity?
  Did you try to put the attribute omitTermFreqAndPositions in your
 field?
 
  It could be:
 
  field name=description omitTermFreqAndPositions=truetype=text
  indexed=true stored=true  multiValued=false omitNorms=true /
 
  http://wiki.apache.org/solr/SchemaXml
 
 
  On Thu, Mar 21, 2013 at 7:35 AM, xavier jmlucjav jmluc...@gmail.com
  wrote:
 
  I have the following setup:
 
 fieldType name=text class=solr.TextField
  positionIncrementGap=100
 analyzer
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
 /analyzer
 /fieldType
 field name=descriptiontype=text   indexed=true
  stored=true   multiValued=false omitNorms=true /
 
  I index my corpus, and I can see tf is as usual, in this doc is 14
 times
  in
  this field:
  4.5094776 = (MATCH) weight(description:galaxy^10.0 in 440)
  [DefaultSimilarity], result of:
   4.5094776 = score(doc=440,freq=14.0 = termFreq=14.0), product of:
 0.14165252 = queryWeight, product of:
   10.0 = boost
   8.5082035 = idf(docFreq=30, maxDocs=56511)
   0.0016648936 = queryNorm
 31.834784 = fieldWeight in 440, product of:
   3.7416575 = tf(freq=14.0), with freq of:
 14.0 = termFreq=14.0
   8.5082035 = idf(docFreq=30, maxDocs=56511)
   1.0 = fieldNorm(doc=440)
 
 
  Then I modify my schema:
 
 similarity class=solr.SchemaSimilarityFactory/
 fieldType name=text class=solr.TextField
  positionIncrementGap=100
 analyzer
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
 /analyzer
 similarity class=com.customsolr.NoTfSimilarityFactory/
 /fieldType
 
  I just want to disable term freq  1, so a term its either present or
  not.
 
  public class NoTfSimilarity extends DefaultSimilarity {
 public float tf(float freq) {
 return freq  0 ? 1.0f : 0.0f;
 }
  }
 
  But I still see tf=14 in my query??
  723.89526 = (MATCH) weight(description:galaxy^10.0 in 440) [], result
 of:
 723.89526 = score(doc=440,freq=14.0 = termFreq=14.0), product
 of:
   85.08203 = queryWeight, product of:
 10.0 = boost
 8.5082035 = idf(docFreq=30, maxDocs=56511)
 1.0 = queryNorm
   8.5082035 = fieldWeight in 440, product of:
 1.0 = tf(freq=14.0), with freq of:
   14.0 = termFreq=14.0
 8.5082035 = idf(docFreq=30, maxDocs=56511)
 1.0 = fieldNorm(doc=440)
 
  anyone sees what I am missing?
  I am on solr4.0
 
  thanks
  xavier
 
 
 
 
  --
  Felipe Lahti
  Consultant Developer - ThoughtWorks Porto Alegre
 




Re: custom similary on a field not working

2013-03-21 Thread xavier jmlucjav
Damn...I was obfuscated seeing the 14 there...I had naively thought that
term freq would not be stored in the doc, 1 would be stored, but I guess it
still stores the real value and then applies custom similarity at query
time.

That means changing to a custom similarity does not need reindexing right?

thanks for the help!
xavier


On Thu, Mar 21, 2013 at 5:26 PM, Chris Hostetter
hossman_luc...@fucit.orgwrote:

 :  public class NoTfSimilarity extends DefaultSimilarity {
 :  public float tf(float freq) {
 :  return freq  0 ? 1.0f : 0.0f;
 :  }
 :  }
 ...

 :  But I still see tf=14 in my query??
 ...
 :  1.0 = tf(freq=14.0), with freq of:
 :14.0 = termFreq=14.0

 pretty sure you are looking at the explanation of the *input* to your tf()
 function, not that the *output* is 1.0, just like in your function.

 Did you compare this to what you see using the DefaultSimilarity?



 -Hoss



Re: 4.0 hanging on startup on Windows after Control-C

2013-03-18 Thread xavier jmlucjav
Hi Shawn,

I am using DIH with commit at the end...I'll investigate further to see if
this is what is happening and will report back, also will check 4.2 (that I
had to do anyway...).
thanks for your input
xavier


On Mon, Mar 18, 2013 at 6:12 PM, Shawn Heisey s...@elyograg.org wrote:

 On 3/17/2013 11:51 AM, xavier jmlucjav wrote:

 Hi,

 I have an index where, if I kill solr via Control-C, it consistently hangs
 next time I start it. Admin does not show cores, and searches never
 return.
 If I delete the index contents and I restart again all is ok. I am on
 windows 7, jdk1.7 and Solr4.0.
 Is this a known issue? I looked in jira but found nothing.


 I scanned your thread dump.  Nothing jumped out at me, but given my
 inexperience with such things, I'm not surprised by that.

 Have you tried 4.1 or 4.2 yet to see if the problem persists?  4.0 is no
 longer the new hotness.

 Below I will discuss the culprit that springs to mind, though I don't know
 whether it's what you are actually hitting.

 One thing that can make Solr take a really long time to start up is huge
 transaction logs.  Transaction logs must be replayed when Solr starts, and
 if they are huge, it can take a really long time.

 Do you have tlog directories in your cores (in the data dir, next to the
 index directory), and if you do, how much disk space do they use?  The
 example config in 4.x has updateLog turned on.

 There are two common situations that can lead to huge transaction logs.
  One is exclusively using soft commits when indexing, the other is running
 a very large import with the dataimport handler and not committing until
 the very end.

 AutoCommit with openSearcher=false is a good solution to both of these
 situations.  As long as you use openSearcher=false, it will not change what
 documents are visible.  AutoCommit does a regular hard commit every X new
 documents or every Y milliseconds.  A hard commit flushes index data to
 disk and starts a new transaction log.  Solr will only keep a few
 transaction logs around, so frequently building new ones keeps their size
 down.  When you restart Solr, you don't need to wait for a long time while
 it replays them.

 Thanks,
 Shawn




Re: Is there an EdgeSingleFilter already?

2013-03-17 Thread xavier jmlucjav
Steve, worked like a charm.
thanks!


On Sun, Mar 17, 2013 at 7:37 AM, Steve Rowe sar...@gmail.com wrote:

 See https://issues.apache.org/jira/browse/LUCENE-4843

 Let me know if it works for you.

 Steve

 On Mar 16, 2013, at 5:35 PM, xavier jmlucjav jmluc...@gmail.com wrote:

  I read too fast your reply, so I thought you meant configuring the
  LimitTokenPositionFilter. I see you mean I have to write one, ok...
 
 
 
  On Sat, Mar 16, 2013 at 10:33 PM, xavier jmlucjav jmluc...@gmail.com
 wrote:
 
  Steve,
 
  Yes, I want only one, one two, and one two three, but nothing
 else.
  Cool if this can be achieved without java code even better, I'll check
 that
  filter.
 
  I need this for building a field used for suggestions, the user
  specifically wants no match only from the edge.
 
  thanks!
 
  On Sat, Mar 16, 2013 at 10:22 PM, Steve Rowe sar...@gmail.com wrote:
 
  Hi xavier,
 
  It's not clear to me what you want.  Is the edge you're referring to
  the beginning of a field? E.g. raw text one two three four with
  EdgeShingleFilter configured to produce unigrams, bigrams and trigams
 would
  produce one, one two, and one two three, but nothing else?
 
  If so, I suspect writing a LimitTokenPositionFilter (which would stop
  emitting tokens after the token position exceeds a specified limit)
 would
  be better, rather than subclassing ShingleFilter.  You could use
  LimitTokenCountFilter as a model, especially its comsumeAllTokens
 option.
  I think this would make a nice addition to Lucene.
 
  Also, what do you plan to use this for?
 
  Steve
 
  On Mar 16, 2013, at 5:02 PM, xavier jmlucjav jmluc...@gmail.com
 wrote:
  Hi,
 
  I need to use shingles but only keep the ones that start from the
 edge.
 
  I want to confirm there is no way to get this feature without
  subclassing
  ShingleFilter, cause I thought someone would have already encountered
  this
  use case
 
  thanks
  xavier
 
 
 




4.0 hanging on startup on Windows after Control-C

2013-03-17 Thread xavier jmlucjav
Hi,

I have an index where, if I kill solr via Control-C, it consistently hangs
next time I start it. Admin does not show cores, and searches never return.
If I delete the index contents and I restart again all is ok. I am on
windows 7, jdk1.7 and Solr4.0.
Is this a known issue? I looked in jira but found nothing.
xavier

Here is a thread dump:

2013-03-17 17:58:33
Full thread dump Java HotSpot(TM) 64-Bit Server VM (23.7-b01 mixed mode):

JMX server connection timeout 30 daemon prio=6 tid=0x0bbf9000
nid=0x3b4c in Object.wait() [0x1df3e000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on 0xe7054338 (a [I)
at
com.sun.jmx.remote.internal.ServerCommunicatorAdmin$Timeout.run(ServerCommunicatorAdmin.java:168)
- locked 0xe7054338 (a [I)
at java.lang.Thread.run(Thread.java:722)

   Locked ownable synchronizers:
- None

RMI Scheduler(0) daemon prio=6 tid=0x0bbf8000 nid=0x39d8 waiting
on condition [0x1db9f000]
   java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  0xb9e1e6d8 (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
at
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1090)
at
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:807)
at
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)

   Locked ownable synchronizers:
- None

RMI TCP Connection(1)-192.168.1.128 daemon prio=6 tid=0x0bbf7800
nid=0x111c runnable [0x1dd3e000]
   java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:150)
at java.net.SocketInputStream.read(SocketInputStream.java:121)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
- locked 0xe70003c8 (a java.io.BufferedInputStream)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at
sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
at
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:808)
at
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:667)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)

   Locked ownable synchronizers:
- 0xb959bc68 (a
java.util.concurrent.ThreadPoolExecutor$Worker)

RMI TCP Accept-0 daemon prio=6 tid=0x0bbf5000 nid=0x1fe0 runnable
[0x1da4e000]
   java.lang.Thread.State: RUNNABLE
at java.net.DualStackPlainSocketImpl.accept0(Native Method)
at
java.net.DualStackPlainSocketImpl.socketAccept(DualStackPlainSocketImpl.java:121)
at
java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398)
at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:183)
- locked 0xb9531a78 (a java.net.SocksSocketImpl)
at java.net.ServerSocket.implAccept(ServerSocket.java:522)
at java.net.ServerSocket.accept(ServerSocket.java:490)
at
sun.management.jmxremote.LocalRMIServerSocketFactory$1.accept(LocalRMIServerSocketFactory.java:52)
at
sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:387)
at
sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:359)
at java.lang.Thread.run(Thread.java:722)

   Locked ownable synchronizers:
- None

DestroyJavaVM prio=6 tid=0x0bbf6800 nid=0x60c waiting on
condition [0x]
   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers:
- None

searcherExecutor-6-thread-1 prio=6 tid=0x0bbf6000 nid=0x3480 in
Object.wait() [0x1441e000]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on 0xb9e6a4a0 (a java.lang.Object)
at java.lang.Object.wait(Object.java:503)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1379)
- locked 0xb9e6a4a0 (a java.lang.Object)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1200)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1135)
at

Re: Is there an EdgeSingleFilter already?

2013-03-16 Thread xavier jmlucjav
Steve,

Yes, I want only one, one two, and one two three, but nothing else.
Cool if this can be achieved without java code even better, I'll check that
filter.

I need this for building a field used for suggestions, the user
specifically wants no match only from the edge.

thanks!

On Sat, Mar 16, 2013 at 10:22 PM, Steve Rowe sar...@gmail.com wrote:

 Hi xavier,

 It's not clear to me what you want.  Is the edge you're referring to the
 beginning of a field? E.g. raw text one two three four with
 EdgeShingleFilter configured to produce unigrams, bigrams and trigams would
 produce one, one two, and one two three, but nothing else?

 If so, I suspect writing a LimitTokenPositionFilter (which would stop
 emitting tokens after the token position exceeds a specified limit) would
 be better, rather than subclassing ShingleFilter.  You could use
 LimitTokenCountFilter as a model, especially its comsumeAllTokens option.
  I think this would make a nice addition to Lucene.

 Also, what do you plan to use this for?

 Steve

 On Mar 16, 2013, at 5:02 PM, xavier jmlucjav jmluc...@gmail.com wrote:
  Hi,
 
  I need to use shingles but only keep the ones that start from the edge.
 
  I want to confirm there is no way to get this feature without subclassing
  ShingleFilter, cause I thought someone would have already encountered
 this
  use case
 
  thanks
  xavier




Re: Is there an EdgeSingleFilter already?

2013-03-16 Thread xavier jmlucjav
I read too fast your reply, so I thought you meant configuring the
LimitTokenPositionFilter. I see you mean I have to write one, ok...



On Sat, Mar 16, 2013 at 10:33 PM, xavier jmlucjav jmluc...@gmail.comwrote:

 Steve,

 Yes, I want only one, one two, and one two three, but nothing else.
 Cool if this can be achieved without java code even better, I'll check that
 filter.

 I need this for building a field used for suggestions, the user
 specifically wants no match only from the edge.

 thanks!

 On Sat, Mar 16, 2013 at 10:22 PM, Steve Rowe sar...@gmail.com wrote:

 Hi xavier,

 It's not clear to me what you want.  Is the edge you're referring to
 the beginning of a field? E.g. raw text one two three four with
 EdgeShingleFilter configured to produce unigrams, bigrams and trigams would
 produce one, one two, and one two three, but nothing else?

 If so, I suspect writing a LimitTokenPositionFilter (which would stop
 emitting tokens after the token position exceeds a specified limit) would
 be better, rather than subclassing ShingleFilter.  You could use
 LimitTokenCountFilter as a model, especially its comsumeAllTokens option.
  I think this would make a nice addition to Lucene.

 Also, what do you plan to use this for?

 Steve

 On Mar 16, 2013, at 5:02 PM, xavier jmlucjav jmluc...@gmail.com wrote:
  Hi,
 
  I need to use shingles but only keep the ones that start from the edge.
 
  I want to confirm there is no way to get this feature without
 subclassing
  ShingleFilter, cause I thought someone would have already encountered
 this
  use case
 
  thanks
  xavier





[ANN] vifun: a GUI to help visually tweak Solr scoring, release 0.6

2013-03-10 Thread xavier jmlucjav
Hi,

I am releasing an new version (0.6) of vifun, a GUI to help visually tweak
Solr scoring. Most relevant changes are:
- support float values
- add support for tie
- synch both Current/Baseline scrollbars (if some checkbox is selected)
- doubleclick in a doc: show side by side comparison of debug score info
- upgrade to griffon1.2.0
- allow using another handler (besides /select) enhancement

You can check it out here: https://github.com/jmlucjav/vifun
Binary distribution:
http://code.google.com/p/vifun/downloads/detail?name=vifun-0.6.zip

xavier


Re: [ANN] vifun: tool to help visually tweak Solr boosting

2013-03-04 Thread xavier jmlucjav
Hi Mark,

Thanks for trying it out.

Let me see if I explain it better: the number you have to select (in order
to later being able to tweak it with the slider), is  any number that must
be in one of the parameters in the Scoring section.

The issue you have, is that you are using /select handler from the example
distribution, and that handler does not have any of these parameters (qf,
pf, pf2, pf3, ps, ps2, ps3, bf, bq, boost, mm, tie), so it's normal they
don't show up, there is nothing to tweak...

In the example configuration from 4.1, you can select /browse handler, as
it uses qf and mm, and you should be able to tweak them. Of course If you
were using a real Solr installation with a sizable number of documents and
some complex usage of edismax, you would be able to see much better what
the tool can do.

xavier


On Mon, Mar 4, 2013 at 10:52 PM, Mark Bennett
mark.benn...@lucidworks.comwrote:

 Hello Xavier,

 Thanks for uploading this and sharing.  I also read the other messages in
 the thread.

 I'm able to get part way through your Getting Started section, I get
 results, but I get stuck on the editing values.  I've tried with Java 6 and
 7, with both the 0.5 binary and from the source distribution.

 What's working:
 * Default Solr 4.1 install  (plus a couple extra fields in schema)
 * Able to connect to Solr (/collection1)
 * Able to select handler (/select)
 * Able to run a search:
   q=bandwidth
   rows=10
   fl=title
   rest: pt=45.15,-93.85 (per your example)
 * Get 2 search results with titles
 * Able to select a result, mouse over, highlight score, etc.

 However, what I'm stuck on:
 * Below the Run Query button, I only see the grayed out Scoring slider.
 * The instructions say to highlight some numbers
   - I tried highlighting the 10 in rows paramour
   - I also tried the 45.15 in rest, and some of the scores in the
 results list

 I never see the extra parameters you show in this screen shot:

 https://raw.github.com/jmlucjav/vifun/master/img/screenshot-selecttarget.jpg
 I see the word Scoring:
 I don't see the blue text Select a number as a target to tweak
 I don't see the parameters qf, bf_0, 1, 2, bq_0, etc.

 I'm not sure how to get those extra fields to appear in the UI.

 I also tried adding defType=edismax, no luck

 The Handlers it sees:
 /select, /query, /browse, /spell, /tvrh, /clustering, /terms,
 /elevate
 (from default Solr 4.1 solrconfig.xml)
 I'm using /select


 --
 Mark Bennett / LucidWorks: Search  Big Data / mark.benn...@lucidworks.com
 Office: 408-898-4201 / Telecommute: 408-733-0387 / Cell: 408-829-6513







 On Feb 23, 2013, at 6:12 AM, jmlucjav jmluc...@gmail.com wrote:

  Hi,
 
  I have built a small tool to help me tweak some params in Solr (typically
  qf, bf in edismax). As maybe others find it useful, I am open sourcing it
  on github: https://github.com/jmlucjav/vifun
 
  Check github for some more info and screenshots. I include part of the
  github page below.
  regards
 
  Description
 
  Did you ever spend lots of time trying to tweak all numbers in a
 *edismax*
  handler *qf*, *bf*, etc params so docs get scored to your liking? Imagine
  you have the params below, is 20 the right boosting for *name* or is it
 too
  much? Is *population* being boosted too much versus distance? What about
  new documents?
 
 !-- fields, boost some --
 str name=qfname^20 textsuggest^10 edge^5 ngram^2
 phonetic^1/str
 str name=mm33%/str
 !-- boost closest hits --
 str name=bfrecip(geodist(),1,500,0)/str
 !-- boost by population --
 str name=bfproduct(log(sum(population,1)),100)/str
 !-- boost newest docs --
 str name=bfrecip(rord(moddate),1,1000,1000)/str
 
  This tool was developed in order to help me tweak the values of boosting
  functions etc in Solr, typically when using edismax handler. If you are
 fed
  up of: change a number a bit, restart Solr, run the same query to see how
  documents are scored now...then this tool is for you.
  https://github.com/jmlucjav/vifun#featuresFeatures
 
- Can tweak numeric values in the following params: *qf, pf, bf, bq,
boost, mm* (others can be easily added) even in *appends or
invariants*
- View side by side a Baseline query result and how it changes when you
gradually change each value in the params
- Colorized values, color depends on how the document does related to
baseline query
- Tooltips give you Explain info
- Works on remote Solr installations
- Tested with Solr 3.6, 4.0 and 4.1 (other versions would work too, as
long as wt=javabin format is compatible)
- Developed using Groovy/Griffon
 
  https://github.com/jmlucjav/vifun#requirementsRequirements
 
- */select* handler should be available, and not have any *appends or
invariants*, as it could interfere with how vifun works.
- Java6 is needed (maybe it runs on Java5 too). A JRE should be enough.