Re: Data Import from a Queue

2011-07-20 Thread Stefan Matheis

Brandon,

i don't know how they are using it in detail, but Part of Chef's 
Architecture is this one:


Chef Server - RabbitMQ - Chef Solr Indexer - Solr
http://wiki.opscode.com/download/attachments/7274878/chef-server-arch.png

Perhaps not exactly, what you're looking for - but may give you an idea?

Regards
Stefan

Am 19.07.2011 19:04, schrieb Brandon Fish:

Let me provide some more details to the question:

I was unable to find any example implementations where individual documents
(single document per message) are read from a message queue (like ActiveMQ
or RabbitMQ) and then added to Solr via SolrJ, a HTTP POST or another
method. Does anyone know of any available examples for this type of import?

If no examples exist, what would be a recommended commit strategy for
performance? My best guess for this would be to have a queue per core and
commit once the queue is empty.

Thanks.

On Mon, Jul 18, 2011 at 6:52 PM, Erick Ericksonerickerick...@gmail.comwrote:


This is a really cryptic problem statement.

you might want to review:

http://wiki.apache.org/solr/UsingMailingLists

Best
Erick

On Fri, Jul 15, 2011 at 1:52 PM, Brandon Fishbrandon.j.f...@gmail.com
wrote:

Does anyone know of any existing examples of importing data from a queue
into Solr?

Thank you.







Re: how to get solr core information using solrj

2011-07-20 Thread Stefan Matheis

Jiang,

what about http://wiki.apache.org/solr/CoreAdmin#STATUS ?

Regards
Stefan

Am 20.07.2011 05:40, schrieb Jiang mingyuan:

hi all,

Our solr server contains two cores:core0,core1,and they both works well.

Now I'am trying to find a way to get information about core0 and core1.

Can solrj or other api do this?


thanks very much.



suggester component from trunk throwing error

2011-07-20 Thread abhayd
hi 
I am trying to configure suggester component. I downloaded solr from trunk
and did a build.

here is my config
  requestHandler name=/suggest
class=org.apache.solr.handler.component.SearchHandler
lst name=defaults
str name=spellchecktrue/str
str name=spellcheck.dictionarysuggest/str
str name=spellcheck.count10/str
/lst
arr name=components
 strsuggest/str
/arr
  /requestHandler

  searchComponent name=suggest class=solr.SpellCheckComponent
lst name=spellchecker
str name=namesuggest/str
str 
name=classnameorg.apache.solr.spelling.suggest.Suggester/str
str
name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookup/str
str name=fieldname_autocomplete/str
str name=buildOnCommittrue/str
/lst
  /searchComponent

When i build my index, index gets created but i get following exception
-
Jul 20, 2011 2:32:00 AM
org.apache.solr.handler.component.SpellCheckComponent$SpellCheckerListener
buildSpellIndex
INFO: Building spell index for spell checker: suggest
Jul 20, 2011 2:32:00 AM org.apache.solr.spelling.suggest.Suggester build
INFO: build()
Jul 20, 2011 2:32:00 AM org.apache.solr.common.SolrException log
SEVERE: java.lang.NoSuchMethodError:
org.apache.lucene.index.IndexReader.fields()Lorg/apache/lucene/index/Fields;
at
org.apache.lucene.index.MultiFields.getFields(MultiFields.java:64)
at
org.apache.lucene.index.MultiFields.getFields(MultiFields.java:69)
at
org.apache.lucene.index.MultiFields.getTerms(MultiFields.java:142)
at
org.apache.lucene.search.spell.HighFrequencyDictionary$HighFrequencyIterator.init(HighFrequencyDictionary.java:65)
at
org.apache.lucene.search.spell.HighFrequencyDictionary.getWordsIterator(HighFrequencyDictionary.java:54)
at org.apache.lucene.search.suggest.Lookup.build(Lookup.java:63)
at
org.apache.solr.spelling.suggest.Suggester.build(Suggester.java:136)
at
org.apache.solr.handler.component.SpellCheckComponent$SpellCheckerListener.buildSpellIndex(SpellCheckComponent.java:373)
at
org.apache.solr.handler.component.SpellCheckComponent$SpellCheckerListener.newSearcher(SpellCheckComponent.java:358)
at org.apache.solr.core.SolrCore$4.call(SolrCore.java:1163)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

Any help?


--
View this message in context: 
http://lucene.472066.n3.nabble.com/suggester-component-from-trunk-throwing-error-tp3184736p3184736.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: - character in search query

2011-07-20 Thread roySolr
Here is my complete fieldtype:

fieldType name=name class=solr.TextField positionIncrementGap=100
  analyzer
charFilter class=solr.HTMLStripCharFilterFactory/
tokenizer class=solr.PatternTokenizerFactory pattern=\s|, /
filter class=solr.LowerCaseFilterFactory/
filter class=solr.PatternReplaceFilterFactory pattern=-
replacement=/ 
filter class=solr.ASCIIFoldingFilterFactory/ 
  /analyzer
/fieldType

In the Field Analysis i see that the - is removed by the
patternreplaceFilter. When i escaped the term($q =
SolrUtils::escapeQueryChars($q);) i see in my debugQuery something like
this(term = arsenal - london):

+((DisjunctionMaxQuery((name:arsenal)~1.0) DisjunctionMaxQuery((name:\
london~1.0))~2) ()

When i don't escaped the query i get something like this:

+((DisjunctionMaxQuery((name:arsenal)~1.0)
-DisjunctionMaxQuery((name:london)~1.0))~1) ()

The - is my term is used by the -DisjunctionMaxQuery. How can i fix this
problem? What is the Easiest way?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/character-in-search-query-tp3168604p3184805.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: query time boosting in solr

2011-07-20 Thread Sowmya V.B.
Can anyone throw some light on this issue?

My problem is to: give a query time boost to certain documents, which have a
field, say field1, in the range that the user chooses during query time. I
think the below link indicates a range query:

http://localhost:8085/solr/select?indent=onversion=2.2q=scientific+temper+field1:[10%20TO%2030]start=0rows=10

But, apart from that, how can I indicate a boost for the condition
field1:[10%20TO%2030]?

I tried using a bq=field1:[20 TO 25] and also bq=field1:[20 TO 25]^10
-But I am not able to figure out what these two mean, from the results.
Because, i get top1 result as a document where field1 is 40..in this
case..after using bq clause. I increased the boost to 10,20,50 100..but the
results dont change at all.

S.

On Tue, Jul 19, 2011 at 4:28 PM, Sowmya V.B. vbsow...@gmail.com wrote:

 Hi

 Is query time boosting possible in Solr?

 Here is what I want to do: I want to boost the ranking of certain
 documents, which have their relevant field values, in a particular range
 (selected by user at query time)...

 when I do something like:

 http://localhost:8085/solr/select?indent=onversion=2.2q=scientific+temperfq=field1:[10%20TO%2030]start=0rows=10
 -I guess, it is just a filter over the normal results and not exactly a
 query.

 I tried giving this:

 http://localhost:8085/solr/select?indent=onversion=2.2q=scientific+temper+field1:[10%20TO%2030]start=0rows=10
 -This still worked and gave me different results. But, I did not quite
 understand what this second query meant. Does it mean: Rank those documents
 with field1 value in 10-30 better than those without ?

 S
 --
 Sowmya V.B.
 
 Losing optimism is blasphemy!
 http://vbsowmya.wordpress.com
 




-- 
Sowmya V.B.

Losing optimism is blasphemy!
http://vbsowmya.wordpress.com



Re: Solr UI

2011-07-20 Thread Gora Mohanty
On Tue, Jul 19, 2011 at 7:51 PM, Erik Hatcher erik.hatc...@gmail.com wrote:
 There's several starting points for Solr UI out there, but really the best 
 choice is whatever fits your environment and the skills/resources you have 
 handy.  Here's a few off the top of my head -
[...]

Besides these excellent examples, if you are looking at Python/Django,
Haystack works well as a starting point, though:
* One does have to build a template/view architecture around it,
  that is fairly easy to do.
* Haystack allows multiple search back-ends, and while that is
  convenient for starting out, it does not implement some Solr
  features. E.g., one big missing item is support for multi-core
  Solr.

Regards,
Gora


Re: any detailed tutorials on plugin development?

2011-07-20 Thread Gora Mohanty
On Wed, Jul 20, 2011 at 6:29 AM, deniz denizdurmu...@gmail.com wrote:
 gosh sorry for my typo in msg first... i just realized it now... well
 anyway...

 i would like to find a detailed tutorial about how to implement an analyzer
 or a request handler plugin... but all i have got is nothing from the
 documentation of solr wiki...

This does not help: http://wiki.apache.org/solr/SolrPlugins ?

Google also turns up multiple examples, e.g.,
http://e-mats.org/2008/06/writing-a-solr-analysis-filter-plugin/
I remember using that blog as a starting point for writing
a custom plugin.

Regards,
Gora


Re: - character in search query

2011-07-20 Thread roySolr
When i use the edismax handler the escaping works great(before i used the
dismax handler).The debugQuery shows me this:

+((DisjunctionMaxQuery((name:arsenal)~1.0)
DisjunctionMaxQuery((name:london)~1.0))~2

The \ is not in the parsedquery, so i get the results i wanted. I don't
know why the dismax handler working this way.

Can someone tells me the difference between the dismax and edismax handler?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/character-in-search-query-tp3168604p3184941.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: any detailed tutorials on plugin development?

2011-07-20 Thread samuele.mattiuzzo
actually i'm rewriting http://wiki.apache.org/solr/UpdateRequestProcessor
this wiki page with a more detailed how-to, it will be ready and online
after i get back from work!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/any-detailed-tutorials-on-plugin-development-tp3177821p3184990.html
Sent from the Solr - User mailing list archive at Nabble.com.


term positions performance

2011-07-20 Thread Marco Martinez
Hi,

I am developing a new query term proximity and i am using the term positions
to get the positions of each term. I want to know if there is any clues to
increase the perfomance of using term positions, in index time o in query
time, all my fields that i am applying the term positions are indexed.

Thanks in advance,

Marco Martínez Bautista
http://www.paradigmatecnologico.com
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón
Tel.: 91 352 59 42


Re: term positions performance

2011-07-20 Thread Marco Martinez
Also, i develop this query via function query, i wonder if i do it via a
normal query will increase the perfomance..

Marco Martínez Bautista
http://www.paradigmatecnologico.com
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón
Tel.: 91 352 59 42


2011/7/20 Marco Martinez mmarti...@paradigmatecnologico.com

 Hi,

 I am developing a new query term proximity and i am using the term
 positions to get the positions of each term. I want to know if there is any
 clues to increase the perfomance of using term positions, in index time o in
 query time, all my fields that i am applying the term positions are indexed.

 Thanks in advance,

 Marco Martínez Bautista
 http://www.paradigmatecnologico.com
 Avenida de Europa, 26. Ática 5. 3ª Planta
 28224 Pozuelo de Alarcón
 Tel.: 91 352 59 42



Re: POST VS GET and NON English Characters

2011-07-20 Thread Sujatha Arun
Paul ,

I added the fllowing line to catalina.sh  and restarted the server ,but this
does not seem to help.


JAVA_OPTS=-Djavax.servlet.request.encoding=UTF-8 -Dfile.encoding=UTF-8
Regards
Sujatha

On Sun, Jul 17, 2011 at 3:51 AM, Paul Libbrecht p...@hoplahup.net wrote:

 If you have the option, try setting the default charset of the
 servlet-container to utf-8.
 Typically this is done by setting a system property on startup.

 My experience has been that the default used to be utf-8 but it is less and
 less and sometimes in a surprising way!

 paul


 Le 16 juil. 2011 à 05:34, Sujatha Arun a écrit :

  It works fine with GET method ,but I am wondering why it does not with
 POST
  method.
 
  2011/7/15 pankaj bhatt panbh...@gmail.com
 
  Hi Arun,
  This looks like an Encoding issue to me.
   Can you change your browser settinsg to UTF-8 and hit the search
 url
  via GET method.
 
We faced the similar problem with chienese,korean languages, this
  solved the problem.
 
  / Pankaj Bhatt.
 
  2011/7/15 Sujatha Arun suja.a...@gmail.com
 
  Hello,
 
  We have implemented solr search in  several languages .Intially we used
  the
  GET method for querying ,but later moved to  POST method to
  accomodate
  lengthy queries .
 
  When we moved form  GET TO POSt method ,the german characteres could no
  longer be searched and I had to use the fucntion utf8_decode in my
  application  for the search to work for german characters.
 
  Currently I am doing this  while quering using the POST method ,we are
  using
  the standard Request Handler
 
 
  $this-_queryterm=iconv(UTF-8, ISO-8859-1//TRANSLIT//IGNORE,
  $this-_queryterm);
 
 
  This makes the query work for german characters and other languages but
  does
  not work for certain charactes  in Lithuvanian and spanish.Example:
  *Not working
 
   - *Iš
   - Estremadūros
   - sNaująjį
   - MEDŽIAGOTYRA
   - MEDŽIAGOS
   - taškuose
 
  *Working
 
   - *garbę
   - ieškoti
   - ispanų
 
  Any ideas /input  ?
 
  Regards
  Sujatha
 
 




Re: embeded solrj doesn't refresh index

2011-07-20 Thread Marco Martinez
You should send a commit to you embedded solr

Marco Martínez Bautista
http://www.paradigmatecnologico.com
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón
Tel.: 91 352 59 42


2011/7/20 Jianbin Dai j...@huawei.com

 Hi,



 I am using embedded solrj. After I add new doc to the index, I can see the
 changes through solr web, but not from embedded solrj. But after I restart
 the embedded solrj, I do see the changes. It works as if there was a cache.
 Anyone knows the problem? Thanks.



 Jianbin




Re: query time boosting in solr

2011-07-20 Thread Sowmya V.B.
Hi Tomasso

Thanks for a quick response.

So, if I say:
http://localhost:8085/apache-solr-3.3.0/select?indent=onversion=2.2*
defType=dismax*q=scientificbq=Field1:[20%20TO%2025]^10start=0rows=30
-will it be right?

The above query: boosts the documents which suit the given query
(scientific), which has Field1 values between 20-25, by a factor of 10 :
Is that right??

S

2011/7/20 Tomás Fernández Löbbe tomasflo...@gmail.com

 Hi Sowmya, bq is a great way of boosting, but you have to be using the
 Dismax Query Parser or the Extended Dismax (edismax) query parser, it
 doesn't work with the Lucene Query Parser. If you can use any of those,
 then
 that's the solution. If you need to use the Lucene Query Parser, for a user
 query like:

 scientific temper

 you could create a query like:

 (scientific temper) OR (scientific temper AND (field1:[10 TO 2030]))^X

 being X the boost you want for those documents.

 with your query:
 scientific temper field1:[10 TO 2030]

 you are either adding the condition of the range value for the field (if
 your default operator is AND) or adding another way of matching the query
 (if your default operator ir OR, you can have documents in your result set
 that only matched the range query, and this is not what the user wanted).

 Hope this helps,

 Tomás

 On Wed, Jul 20, 2011 at 5:15 AM, Sowmya V.B. vbsow...@gmail.com wrote:

  Can anyone throw some light on this issue?
 
  My problem is to: give a query time boost to certain documents, which
 have
  a
  field, say field1, in the range that the user chooses during query time.
 I
  think the below link indicates a range query:
 
 
 
 http://localhost:8085/solr/select?indent=onversion=2.2q=scientific+temper+field1:[10%20TO%2030]start=0rows=10
 
  But, apart from that, how can I indicate a boost for the condition
  field1:[10%20TO%2030]?
 
  I tried using a bq=field1:[20 TO 25] and also bq=field1:[20 TO 25]^10
  -But I am not able to figure out what these two mean, from the results.
  Because, i get top1 result as a document where field1 is 40..in this
  case..after using bq clause. I increased the boost to 10,20,50 100..but
  the
  results dont change at all.
 
  S.
 
  On Tue, Jul 19, 2011 at 4:28 PM, Sowmya V.B. vbsow...@gmail.com wrote:
 
   Hi
  
   Is query time boosting possible in Solr?
  
   Here is what I want to do: I want to boost the ranking of certain
   documents, which have their relevant field values, in a particular
 range
   (selected by user at query time)...
  
   when I do something like:
  
  
 
 http://localhost:8085/solr/select?indent=onversion=2.2q=scientific+temperfq=field1:[10%20TO%2030]start=0rows=10
   -I guess, it is just a filter over the normal results and not exactly a
   query.
  
   I tried giving this:
  
  
 
 http://localhost:8085/solr/select?indent=onversion=2.2q=scientific+temper+field1:[10%20TO%2030]start=0rows=10
   -This still worked and gave me different results. But, I did not quite
   understand what this second query meant. Does it mean: Rank those
  documents
   with field1 value in 10-30 better than those without ?
  
   S
   --
   Sowmya V.B.
   
   Losing optimism is blasphemy!
   http://vbsowmya.wordpress.com
   
  
 
 
 
  --
  Sowmya V.B.
  
  Losing optimism is blasphemy!
  http://vbsowmya.wordpress.com
  
 




-- 
Sowmya V.B.

Losing optimism is blasphemy!
http://vbsowmya.wordpress.com



Re: query time boosting in solr

2011-07-20 Thread Tomás Fernández Löbbe
Yes, it should,  but make sure you specify at least the qf parameter for
dismax. You can activate debugQuery and you'll see which documents get
boosted and which aren't.

On Wed, Jul 20, 2011 at 9:21 AM, Sowmya V.B. vbsow...@gmail.com wrote:

 Hi Tomasso

 Thanks for a quick response.

 So, if I say:
 http://localhost:8085/apache-solr-3.3.0/select?indent=onversion=2.2*
 defType=dismax*q=scientificbq=Field1:[20%20TO%2025]^10start=0rows=30
 -will it be right?

 The above query: boosts the documents which suit the given query
 (scientific), which has Field1 values between 20-25, by a factor of 10 :
 Is that right??

 S

 2011/7/20 Tomás Fernández Löbbe tomasflo...@gmail.com

  Hi Sowmya, bq is a great way of boosting, but you have to be using the
  Dismax Query Parser or the Extended Dismax (edismax) query parser, it
  doesn't work with the Lucene Query Parser. If you can use any of those,
  then
  that's the solution. If you need to use the Lucene Query Parser, for a
 user
  query like:
 
  scientific temper
 
  you could create a query like:
 
  (scientific temper) OR (scientific temper AND (field1:[10 TO 2030]))^X
 
  being X the boost you want for those documents.
 
  with your query:
  scientific temper field1:[10 TO 2030]
 
  you are either adding the condition of the range value for the field (if
  your default operator is AND) or adding another way of matching the query
  (if your default operator ir OR, you can have documents in your result
 set
  that only matched the range query, and this is not what the user wanted).
 
  Hope this helps,
 
  Tomás
 
  On Wed, Jul 20, 2011 at 5:15 AM, Sowmya V.B. vbsow...@gmail.com wrote:
 
   Can anyone throw some light on this issue?
  
   My problem is to: give a query time boost to certain documents, which
  have
   a
   field, say field1, in the range that the user chooses during query
 time.
  I
   think the below link indicates a range query:
  
  
  
 
 http://localhost:8085/solr/select?indent=onversion=2.2q=scientific+temper+field1:[10%20TO%2030]start=0rows=10
  
   But, apart from that, how can I indicate a boost for the condition
   field1:[10%20TO%2030]?
  
   I tried using a bq=field1:[20 TO 25] and also bq=field1:[20 TO 25]^10
   -But I am not able to figure out what these two mean, from the results.
   Because, i get top1 result as a document where field1 is 40..in this
   case..after using bq clause. I increased the boost to 10,20,50
 100..but
   the
   results dont change at all.
  
   S.
  
   On Tue, Jul 19, 2011 at 4:28 PM, Sowmya V.B. vbsow...@gmail.com
 wrote:
  
Hi
   
Is query time boosting possible in Solr?
   
Here is what I want to do: I want to boost the ranking of certain
documents, which have their relevant field values, in a particular
  range
(selected by user at query time)...
   
when I do something like:
   
   
  
 
 http://localhost:8085/solr/select?indent=onversion=2.2q=scientific+temperfq=field1:[10%20TO%2030]start=0rows=10
-I guess, it is just a filter over the normal results and not exactly
 a
query.
   
I tried giving this:
   
   
  
 
 http://localhost:8085/solr/select?indent=onversion=2.2q=scientific+temper+field1:[10%20TO%2030]start=0rows=10
-This still worked and gave me different results. But, I did not
 quite
understand what this second query meant. Does it mean: Rank those
   documents
with field1 value in 10-30 better than those without ?
   
S
--
Sowmya V.B.

Losing optimism is blasphemy!
http://vbsowmya.wordpress.com

   
  
  
  
   --
   Sowmya V.B.
   
   Losing optimism is blasphemy!
   http://vbsowmya.wordpress.com
   
  
 



 --
 Sowmya V.B.
 
 Losing optimism is blasphemy!
 http://vbsowmya.wordpress.com
 



Re: Solr 3.3: Exception in thread Lucene Merge Thread #1

2011-07-20 Thread mdz-munich
Update.

After adding 1626 documents without doing a commit or optimize:

/Exception in thread Lucene Merge Thread #1
org.apache.lucene.index.MergePolicy$MergeException: java.io.IOException: Map
failed
at
org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:517)
at
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)
Caused by: java.io.IOException: Map failed
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:782)
at
org.apache.lucene.store.MMapDirectory$MMapIndexInput.init(MMapDirectory.java:264)
at 
org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:216)
at org.apache.lucene.index.FieldsReader.init(FieldsReader.java:129)
at
org.apache.lucene.index.SegmentCoreReaders.openDocStores(SegmentCoreReaders.java:244)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:116)
at 
org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:702)
at 
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4192)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3859)
at
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:388)
at
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:456)
Caused by: java.lang.OutOfMemoryError: Map failed
at sun.nio.ch.FileChannelImpl.map0(Native Method)
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:779)
... 10 more
/

Any ideas, any suggestions?

Greetz  thank you,

Sebastian



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-3-3-Exception-in-thread-Lucene-Merge-Thread-1-tp3185248p3185344.html
Sent from the Solr - User mailing list archive at Nabble.com.


Culr Tika not working with blanks into literal.field

2011-07-20 Thread Peralta Gutiérrez del Álamo

















Hi. 
I'm trying to index binary documents with curl and Tika for extracting text. 

The problem  is that when I set the value of a field with spaces blanks using 
the input parameter: literal.fieldname=value, the document is not indexed. 

The sentence I send is the follow: 

curl 
http://localhost:8983/solr/update/extract?literal.id=doc1\literal.url=/mnt/windows/Ofertas/2006
 Portal 
Intranet/DOCUMENTACION/datos.doc\uprefix=attr_\fmap.content=text\commit=true
 -F myfile=\@/mnt/windows/Ofertas/DOCUMENTACION/datos.doc 


That is literal.url=value with blanks apparently is not working 
  
  

Re: defType argument weirdness

2011-07-20 Thread Yonik Seeley
On Tue, Jul 19, 2011 at 11:41 PM, Jonathan Rochkind rochk...@jhu.edu wrote:
 Is it generally recognized that this terminology is confusing, or is it just 
 me?

 I do understand what they do (at least well enough to use them), but I find 
 it confusing that it's called defType as a main param, but type in a 
 LocalParam

When used as the main param, it is still just the default (i.e. it may
be overridden).
For example defType=luceneq={!func}1

 (and then there's 'qt', often confused with defType/type by newbies, since 
 they guess it stands for 'query type', but which should probably actually 
 have been called 'requestHandler'/'rh' instead, since that's what it actually 
 chooses, no?  It gets very confusing).

Yeah, qt is very historical... before the QParserPlugin framework,
and before request handlers were used for many other things (including
updates).

-Yonik
http://www.lucidimagination.com


 If it's generally recognized it's confusing and perhaps a somewhat 
 inconsistent mental model being implied, I wonder if there'd be any interest 
 in renaming these to be more clear, leaving the old ones as aliases/synonyms 
 for backwards compatibility (perhaps with a long deprecation period, or 
 perhaps existing forever). I know it was very confusing to me to keep track 
 of these parameters and what they did for quite a while, and still trips me 
 up from time to time.

 Jonathan
 
 From: ysee...@gmail.com [ysee...@gmail.com] on behalf of Yonik Seeley 
 [yo...@lucidimagination.com]
 Sent: Tuesday, July 19, 2011 9:40 PM
 To: solr-user@lucene.apache.org
 Subject: Re: defType argument weirdness

 On Tue, Jul 19, 2011 at 1:25 PM, Naomi Dushay ndus...@stanford.edu wrote:
 Regardless, I thought that     defType=dismaxq=*:*   is supposed to be
 equivalent to  q={!defType=dismax}*:*  and also equivalent to q={!dismax}*:*

 Not quite - there is a very subtle distinction.

 {!dismax}  is short for {!type=dismax}, the type of the actual query,
 and this may not be overridden.

 The defType local param is only the default type for sub-queries (as
 opposed to the current query).
 It's useful in conjunction with the query  or nested query qparser:
 http://lucene.apache.org/solr/api/org/apache/solr/search/NestedQParserPlugin.html

 -Yonik
 http://www.lucidimagination.com



Re: query time boosting in solr

2011-07-20 Thread Sowmya V.B.
Hi Tomas

Here is what I was trying to give.

http://localhost:8085/apache-solr-3.3.0/select?indent=onversion=2.2defType=dismaxq=scientificbq=Field1:[20%20TO%2030]
^10start=0rows=30qf=textfl=Field1,dociddebugQuery=on

Over here, I was trying to change the range of Field1, keeping everything
else intact. Here are my observations:

1) The number of results found remain intact. Only that the order of the
results varies.
2) The boost factor (10) does not seem to throw any influence at all.

Here is what the debugQuery says:
str name=parsedquery+DisjunctionMaxQuery((text:scientif)) ()
Field1:[20.0 TO 30.0]^10.0/str
str name=parsedquery_toString+(text:scientif) () Field1:[20.0 TO
30.0]^10.0/str

From these, it seems like its just filtering the results based on the Field1
values, rather than performing a Boost Query.

S.

2011/7/20 Tomás Fernández Löbbe tomasflo...@gmail.com

 Yes, it should,  but make sure you specify at least the qf parameter for
 dismax. You can activate debugQuery and you'll see which documents get
 boosted and which aren't.

 On Wed, Jul 20, 2011 at 9:21 AM, Sowmya V.B. vbsow...@gmail.com wrote:

  Hi Tomasso
 
  Thanks for a quick response.
 
  So, if I say:
  http://localhost:8085/apache-solr-3.3.0/select?indent=onversion=2.2*
  defType=dismax*q=scientificbq=Field1:[20%20TO%2025]^10start=0rows=30
  -will it be right?
 
  The above query: boosts the documents which suit the given query
  (scientific), which has Field1 values between 20-25, by a factor of 10
 :
  Is that right??
 
  S
 
  2011/7/20 Tomás Fernández Löbbe tomasflo...@gmail.com
 
   Hi Sowmya, bq is a great way of boosting, but you have to be using
 the
   Dismax Query Parser or the Extended Dismax (edismax) query parser, it
   doesn't work with the Lucene Query Parser. If you can use any of those,
   then
   that's the solution. If you need to use the Lucene Query Parser, for a
  user
   query like:
  
   scientific temper
  
   you could create a query like:
  
   (scientific temper) OR (scientific temper AND (field1:[10 TO 2030]))^X
  
   being X the boost you want for those documents.
  
   with your query:
   scientific temper field1:[10 TO 2030]
  
   you are either adding the condition of the range value for the field
 (if
   your default operator is AND) or adding another way of matching the
 query
   (if your default operator ir OR, you can have documents in your result
  set
   that only matched the range query, and this is not what the user
 wanted).
  
   Hope this helps,
  
   Tomás
  
   On Wed, Jul 20, 2011 at 5:15 AM, Sowmya V.B. vbsow...@gmail.com
 wrote:
  
Can anyone throw some light on this issue?
   
My problem is to: give a query time boost to certain documents, which
   have
a
field, say field1, in the range that the user chooses during query
  time.
   I
think the below link indicates a range query:
   
   
   
  
 
 http://localhost:8085/solr/select?indent=onversion=2.2q=scientific+temper+field1:[10%20TO%2030]start=0rows=10
   
But, apart from that, how can I indicate a boost for the condition
field1:[10%20TO%2030]?
   
I tried using a bq=field1:[20 TO 25] and also bq=field1:[20 TO
 25]^10
-But I am not able to figure out what these two mean, from the
 results.
Because, i get top1 result as a document where field1 is 40..in this
case..after using bq clause. I increased the boost to 10,20,50
  100..but
the
results dont change at all.
   
S.
   
On Tue, Jul 19, 2011 at 4:28 PM, Sowmya V.B. vbsow...@gmail.com
  wrote:
   
 Hi

 Is query time boosting possible in Solr?

 Here is what I want to do: I want to boost the ranking of certain
 documents, which have their relevant field values, in a particular
   range
 (selected by user at query time)...

 when I do something like:


   
  
 
 http://localhost:8085/solr/select?indent=onversion=2.2q=scientific+temperfq=field1:[10%20TO%2030]start=0rows=10
 -I guess, it is just a filter over the normal results and not
 exactly
  a
 query.

 I tried giving this:


   
  
 
 http://localhost:8085/solr/select?indent=onversion=2.2q=scientific+temper+field1:[10%20TO%2030]start=0rows=10
 -This still worked and gave me different results. But, I did not
  quite
 understand what this second query meant. Does it mean: Rank those
documents
 with field1 value in 10-30 better than those without ?

 S
 --
 Sowmya V.B.
 
 Losing optimism is blasphemy!
 http://vbsowmya.wordpress.com
 

   
   
   
--
Sowmya V.B.

Losing optimism is blasphemy!
http://vbsowmya.wordpress.com

   
  
 
 
 
  --
  Sowmya V.B.
  
  Losing optimism 

Re: Geospatial queries in Solr

2011-07-20 Thread Jamie Johnson
Thanks for responding so quickly, I don't mind waiting a bit.  I'll
hang out until the updates have been  made.  Thanks again.

On Tue, Jul 19, 2011 at 3:51 PM, Smiley, David W. dsmi...@mitre.org wrote:
 Hi Jamie.
 I work on LSP; it can index polygons and query for them. Although the 
 capability is there, we have more testing  benchmarking to do, and then we 
 need to put together a tutorial to explain how to use it at the Solr layer.  
 I recently cleaned up the READMEs a bit.  Try downloading the trunk codebase, 
 and follow the README.  It points to another README which shows off a demo 
 webapp.  At the conclusion of this, you'll need to examine the tests and 
 webapp a bit to figure out how to apply it in your app.  We don't yet have a 
 tutorial as the framework has been in flux  although it has stabilized a good 
 deal.

 Oh... by the way, this works off of Lucene/Solr trunk.  Within the past week 
 there was a major change to trunk and LSP won't compile until we make 
 updates.  Either Ryan McKinley or I will get to that by the end of the week.  
 So unless you have access to 2-week old maven artifacts of Lucene/Solr, 
 you're stuck right now.

 ~ David Smiley
 Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/

 On Jul 19, 2011, at 3:03 PM, Jamie Johnson wrote:

 I have looked at the code being shared on the
 lucene-spatial-playground and was wondering if anyone could provide
 some details as to its state.  Specifically I'm looking to add
 geospatial support to my application based on a user provided polygon,
 is this currently possible using this extension?









Re: Geospatial queries in Solr

2011-07-20 Thread Smiley, David W.
Ryan just updated LSP for Lucene/Solr trunk compatibility so you should do a 
mvn clean install and you'll be back in business.

On Jul 20, 2011, at 10:37 AM, Jamie Johnson wrote:

 Thanks for responding so quickly, I don't mind waiting a bit.  I'll
 hang out until the updates have been  made.  Thanks again.
 
 On Tue, Jul 19, 2011 at 3:51 PM, Smiley, David W. dsmi...@mitre.org wrote:
 Hi Jamie.
 I work on LSP; it can index polygons and query for them. Although the 
 capability is there, we have more testing  benchmarking to do, and then we 
 need to put together a tutorial to explain how to use it at the Solr layer.  
 I recently cleaned up the READMEs a bit.  Try downloading the trunk 
 codebase, and follow the README.  It points to another README which shows 
 off a demo webapp.  At the conclusion of this, you'll need to examine the 
 tests and webapp a bit to figure out how to apply it in your app.  We don't 
 yet have a tutorial as the framework has been in flux  although it has 
 stabilized a good deal.
 
 Oh... by the way, this works off of Lucene/Solr trunk.  Within the past week 
 there was a major change to trunk and LSP won't compile until we make 
 updates.  Either Ryan McKinley or I will get to that by the end of the week. 
  So unless you have access to 2-week old maven artifacts of Lucene/Solr, 
 you're stuck right now.
 
 ~ David Smiley
 Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/
 
 On Jul 19, 2011, at 3:03 PM, Jamie Johnson wrote:
 
 I have looked at the code being shared on the
 lucene-spatial-playground and was wondering if anyone could provide
 some details as to its state.  Specifically I'm looking to add
 geospatial support to my application based on a user provided polygon,
 is this currently possible using this extension?
 
 
 
 
 
 
 



Reading Solr's JSON

2011-07-20 Thread Sowmya V.B.
Hi All

Which is the best way to read Solr's JSON output, from a Java code?
There seems to be a JSONParser in one of the jar files in SolrLib
(org.apache.noggit..)...but I dont understand how to read the parsed output
in this.

Are there any better JSON parsers for Java?

S

-- 
Sowmya V.B.

Losing optimism is blasphemy!
http://vbsowmya.wordpress.com



Re: Solr suggester and spell checker

2011-07-20 Thread abhayd
hi 

I am having same issue, did you find the solution for this problem?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-suggester-and-spell-checker-tp2326907p3185680.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Reading Solr's JSON

2011-07-20 Thread Yonik Seeley
On Wed, Jul 20, 2011 at 10:58 AM, Sowmya V.B. vbsow...@gmail.com wrote:
 Which is the best way to read Solr's JSON output, from a Java code?

You could use SolrJ - it handles parsing for you (and uses the most
efficient binary format by default).

 There seems to be a JSONParser in one of the jar files in SolrLib
 (org.apache.noggit..)...but I dont understand how to read the parsed output
 in this.

If you just want to deserialize into objects (Maps, Lists, etc) then it's easy:

ObjectBuilder.fromJSON(my_json_string)

-Yonik
http://www.lucidimagination.com


Manipulating a Fuzzy Query's Prefix Length

2011-07-20 Thread Kyle Lee
We're performing fuzzy searches on a field possessing a large number of
unique terms. Specifying a required minimum similarity of 0.7 results in a
query execution time of 13-15 seconds, which stands in stark contrast to our
average query time of 40ms.

We suspect that the performance problem most likely emanates from the
enumeration over all the unique terms in the index. The Lucene documentation
for FuzzyQuery supports this theory with the following warning:

*Warning:* this query is not very scalable with its default prefix length
of 0 - in this case, *every* term will be enumerated and cause an edit score
calculation.

We would therefore like to set the prefix length to one or two, mandating
that the first couple of characters match and thereby substantially reduce
the number of terms enumerated. Is this possible with Solr? I haven't yet
discovered a method, if so. Any help would be greatly appreciated.


Tokenizer Question

2011-07-20 Thread Jamie Johnson
I have a query which starts out with something like name:john, I
need to expand this to something like name:(john johnny).  I've
implemented a custom tokenzier which gets close, but isn't quite right
it outputs name:john johnny.  Is there a simple example of doing
what I'm attempting?


How can i find a document by a special id?

2011-07-20 Thread Per Newgro

Hi,

i'm new to solr. I built an application using the standard solr 3.3 
examples as default.
My id field is a string and is copied to a solr.TextField (searchtext) 
for search queries.

All works fine except i try to get documents by a special id.

Let me explain the detail's. Assume id = 1234567. I would like to 
query this document
by using q=searchtext:AB1234567. The prefix (AB) is acting as a 
pseudo-id in our
system. Users know and search for it. But it's not findable because 
solr-index only knows

the short id.

Adding a new document with the prefixed-id as id is not an option. Then 
i have to add

many documents.

For my understanding stemming and ngram tokenizing is not possible
because they act on tokens longer then the search token.

How can i do this?

Thanks
Per


Re: How can i find a document by a special id?

2011-07-20 Thread Kyle Lee
Perhaps I'm missing something, but if your fields are indexed as 1234567
but users are searching for AB1234567, is it not possible simply to strip
the prefix from the user's input before sending the request?

On Wed, Jul 20, 2011 at 10:57 AM, Per Newgro per.new...@gmx.ch wrote:

 Hi,

 i'm new to solr. I built an application using the standard solr 3.3
 examples as default.
 My id field is a string and is copied to a solr.TextField (searchtext)
 for search queries.
 All works fine except i try to get documents by a special id.

 Let me explain the detail's. Assume id = 1234567. I would like to query
 this document
 by using q=searchtext:AB1234567. The prefix (AB) is acting as a pseudo-id
 in our
 system. Users know and search for it. But it's not findable because
 solr-index only knows
 the short id.

 Adding a new document with the prefixed-id as id is not an option. Then i
 have to add
 many documents.

 For my understanding stemming and ngram tokenizing is not possible
 because they act on tokens longer then the search token.

 How can i do this?

 Thanks
 Per



Re: Tokenizer Question

2011-07-20 Thread Kyle Lee
I'm not sure how to accomplish what you're asking, but have you considered
using a synonyms file? This would also allow you to catch ostensibly
unrelated name substitutes such as Robert - Bob and Richard - Dick.

On Wed, Jul 20, 2011 at 10:57 AM, Jamie Johnson jej2...@gmail.com wrote:

 I have a query which starts out with something like name:john, I
 need to expand this to something like name:(john johnny).  I've
 implemented a custom tokenzier which gets close, but isn't quite right
 it outputs name:john johnny.  Is there a simple example of doing
 what I'm attempting?



Re: Tokenizer Question

2011-07-20 Thread Jamie Johnson
My use case really isn't names, I just used that as a simplification.
I did look at the Synonym filter to see if I could implement a similar
filter (if that was a more appropriate place to do so) but even after
doing that I ended up with the same result.

On Wed, Jul 20, 2011 at 12:07 PM, Kyle Lee randall.kyle@gmail.com wrote:
 I'm not sure how to accomplish what you're asking, but have you considered
 using a synonyms file? This would also allow you to catch ostensibly
 unrelated name substitutes such as Robert - Bob and Richard - Dick.

 On Wed, Jul 20, 2011 at 10:57 AM, Jamie Johnson jej2...@gmail.com wrote:

 I have a query which starts out with something like name:john, I
 need to expand this to something like name:(john johnny).  I've
 implemented a custom tokenzier which gets close, but isn't quite right
 it outputs name:john johnny.  Is there a simple example of doing
 what I'm attempting?




Re: Geospatial queries in Solr

2011-07-20 Thread Jamie Johnson
Thanks for the update David, I'll give that a try now.

On Wed, Jul 20, 2011 at 10:58 AM, Smiley, David W. dsmi...@mitre.org wrote:
 Ryan just updated LSP for Lucene/Solr trunk compatibility so you should do a 
 mvn clean install and you'll be back in business.

 On Jul 20, 2011, at 10:37 AM, Jamie Johnson wrote:

 Thanks for responding so quickly, I don't mind waiting a bit.  I'll
 hang out until the updates have been  made.  Thanks again.

 On Tue, Jul 19, 2011 at 3:51 PM, Smiley, David W. dsmi...@mitre.org wrote:
 Hi Jamie.
 I work on LSP; it can index polygons and query for them. Although the 
 capability is there, we have more testing  benchmarking to do, and then we 
 need to put together a tutorial to explain how to use it at the Solr layer. 
  I recently cleaned up the READMEs a bit.  Try downloading the trunk 
 codebase, and follow the README.  It points to another README which shows 
 off a demo webapp.  At the conclusion of this, you'll need to examine the 
 tests and webapp a bit to figure out how to apply it in your app.  We don't 
 yet have a tutorial as the framework has been in flux  although it has 
 stabilized a good deal.

 Oh... by the way, this works off of Lucene/Solr trunk.  Within the past 
 week there was a major change to trunk and LSP won't compile until we make 
 updates.  Either Ryan McKinley or I will get to that by the end of the 
 week.  So unless you have access to 2-week old maven artifacts of 
 Lucene/Solr, you're stuck right now.

 ~ David Smiley
 Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/

 On Jul 19, 2011, at 3:03 PM, Jamie Johnson wrote:

 I have looked at the code being shared on the
 lucene-spatial-playground and was wondering if anyone could provide
 some details as to its state.  Specifically I'm looking to add
 geospatial support to my application based on a user provided polygon,
 is this currently possible using this extension?











Re: How can i find a document by a special id?

2011-07-20 Thread Per Newgro

Am 20.07.2011 18:03, schrieb Kyle Lee:

Perhaps I'm missing something, but if your fields are indexed as 1234567
but users are searching for AB1234567, is it not possible simply to strip
the prefix from the user's input before sending the request?

On Wed, Jul 20, 2011 at 10:57 AM, Per Newgroper.new...@gmx.ch  wrote:


Hi,

i'm new to solr. I built an application using the standard solr 3.3
examples as default.
My id field is a string and is copied to a solr.TextField (searchtext)
for search queries.
All works fine except i try to get documents by a special id.

Let me explain the detail's. Assume id = 1234567. I would like to query
this document
by using q=searchtext:AB1234567. The prefix (AB) is acting as a pseudo-id
in our
system. Users know and search for it. But it's not findable because
solr-index only knows
the short id.

Adding a new document with the prefixed-id as id is not an option. Then i
have to add
many documents.

For my understanding stemming and ngram tokenizing is not possible
because they act on tokens longer then the search token.

How can i do this?

Thanks
Per

Sorry for being not clear here. I only use a single search field. It can 
contain multiple search words.

One of them is the id. So i don't realy know that the search word is an id.
The usecase is: We have a product database with some items. The product 
has an id, name, features
etc. They all go in the described serachtext field. We promote our 
products in different medias. So every
product can have a mediaid (AB is mediacode 1234567 is the id). And 
users should be able to find

the product by id and mediaid.

I hope i could explain myself better.

Thanks for helping me
Per


Wiki Error JSON syntax

2011-07-20 Thread Remy Loubradou
Hi,
I was writing a Solr Client API for Node and I found an error on this page
http://wiki.apache.org/solr/UpdateJSON ,on the section Update Commands the
JSON is not valid because there are duplicate keys and two times with add
and delete.I tried with an array and it doesn't work as well, I got error
400, I think that's because the syntax is bad.

I don't really know if I am at the good place to talk about that but ...
that the only place I found. Sorry if it's not.

Thanks,

And I love Solr :)


Re: Solr 3.3: Exception in thread Lucene Merge Thread #1

2011-07-20 Thread mdz-munich
Here we go ...

This time we tried to use the old LogByteSizeMergePolicy and
SerialMergeScheduler:

mergePolicy class=org.apache.lucene.index.LogByteSizeMergePolicy/
mergeScheduler class=org.apache.lucene.index.SerialMergeScheduler/

We did this before, just to be sure ... 

~300 Documents:

/
SEVERE: java.io.IOException: Map failed
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:782)
at
org.apache.lucene.store.MMapDirectory$MMapIndexInput.init(MMapDirectory.java:264)
at 
org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:216)
at org.apache.lucene.index.FieldsReader.init(FieldsReader.java:129)
at
org.apache.lucene.index.SegmentCoreReaders.openDocStores(SegmentCoreReaders.java:244)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:116)
at 
org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:702)
at 
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4192)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3859)
at
org.apache.lucene.index.SerialMergeScheduler.merge(SerialMergeScheduler.java:37)
at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2714)
at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2709)
at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2705)
at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3509)
at 
org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1850)
at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1814)
at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1778)
at 
org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:143)
at
org.apache.solr.update.DirectUpdateHandler2.closeWriter(DirectUpdateHandler2.java:183)
at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:416)
at
org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:85)
at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:98)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:67)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:240)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:164)
at
org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:462)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:164)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:100)
at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:563)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:403)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:301)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:162)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:140)
at
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:309)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:897)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:919)
at java.lang.Thread.run(Thread.java:736)
Caused by: java.lang.OutOfMemoryError: Map failed
at sun.nio.ch.FileChannelImpl.map0(Native Method)
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:779)
... 44 more

20.07.2011 18:07:30 org.apache.solr.core.SolrCore execute
INFO: [core.digi20] webapp=/solr path=/update params={} status=500
QTime=12302 
20.07.2011 18:07:30 org.apache.solr.common.SolrException log
SEVERE: java.io.IOException: Map failed
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:782)
at
org.apache.lucene.store.MMapDirectory$MMapIndexInput.init(MMapDirectory.java:264)
at 
org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:216)
at 

Re: Wiki Error JSON syntax

2011-07-20 Thread Yonik Seeley
On Wed, Jul 20, 2011 at 12:16 PM, Remy Loubradou
remyloubra...@gmail.com wrote:
 Hi,
 I was writing a Solr Client API for Node and I found an error on this page
 http://wiki.apache.org/solr/UpdateJSON ,on the section Update Commands the
 JSON is not valid because there are duplicate keys and two times with add
 and delete.

It's a common misconception that it's invalid JSON.  Duplicate keys
are in fact legal.

-Yonik
http://www.lucidimagination.com

I tried with an array and it doesn't work as well, I got error
 400, I think that's because the syntax is bad.

 I don't really know if I am at the good place to talk about that but ...
 that the only place I found. Sorry if it's not.

 Thanks,

 And I love Solr :)



Re: query time boosting in solr

2011-07-20 Thread Tomás Fernández Löbbe
So, what you want is to have the same exact results set as if the query was
scientific, but the documents that also match Field1:[20 TO 30] to have
more score, right?

On Wed, Jul 20, 2011 at 10:53 AM, Sowmya V.B. vbsow...@gmail.com wrote:

 Hi Tomas

 Here is what I was trying to give.


 http://localhost:8085/apache-solr-3.3.0/select?indent=onversion=2.2defType=dismaxq=scientificbq=Field1:[20%20TO%2030]
 ^10start=0rows=30qf=textfl=Field1,dociddebugQuery=on


This query seems OK for that purpose.


 Over here, I was trying to change the range of Field1, keeping everything
 else intact. Here are my observations:

 1) The number of results found remain intact. Only that the order of the
 results varies.

Isn't this what was expected?


 2) The boost factor (10) does not seem to throw any influence at all.

It's on the parsed query, Why do you think it doesn't have an influence? Can
you send the debug query output for a document that match the bt?
I tried it with the Solr example and this is what I see:

http://localhost:8983/solr/select?defType=dismaxq=displaybq=weight:[0%20TO%2010]
^10start=0rows=30debugQuery=onqf=features%20name

This is the debug output for a document that match the query and the boost
query:

str name=MA147LL/A

1.137027 = (MATCH) sum of:
  0.1994111 = (MATCH) max of:
0.1994111 = (MATCH) weight(features:display in 0), product of:
  0.34767273 = queryWeight(features:display), product of:
3.7080503 = idf(docFreq=1, maxDocs=30)
0.0937616 = queryNorm
  0.57355976 = (MATCH) fieldWeight(features:display in 0), product of:
1.4142135 = tf(termFreq(features:display)=2)
3.7080503 = idf(docFreq=1, maxDocs=30)
0.109375 = fieldNorm(field=features, doc=0)
  0.937616 = (MATCH) ConstantScore(weight:[0.0 TO 10.0]^10.0)^10.0, product of:
10.0 = boost
0.0937616 = queryNorm
/str

and this is the debug output for a document that only match the main query:

str name=VA902B
0.4834455 = (MATCH) sum of:
  0.4834455 = (MATCH) max of:
0.4834455 = (MATCH) weight(name:display in 12), product of:
  0.34767273 = queryWeight(name:display), product of:
3.7080503 = idf(docFreq=1, maxDocs=30)
0.0937616 = queryNorm
  1.3905189 = (MATCH) fieldWeight(name:display in 12), product of:
1.0 = tf(termFreq(name:display)=1)
3.7080503 = idf(docFreq=1, maxDocs=30)
0.375 = fieldNorm(field=name, doc=12)
/str

Do you have something similar??



 Here is what the debugQuery says:
 str name=parsedquery+DisjunctionMaxQuery((text:scientif)) ()
 Field1:[20.0 TO 30.0]^10.0/str
 str name=parsedquery_toString+(text:scientif) () Field1:[20.0 TO
 30.0]^10.0/str

 From these, it seems like its just filtering the results based on the
 Field1
 values, rather than performing a Boost Query.

 S.

 2011/7/20 Tomás Fernández Löbbe tomasflo...@gmail.com

  Yes, it should,  but make sure you specify at least the qf parameter
 for
  dismax. You can activate debugQuery and you'll see which documents get
  boosted and which aren't.
 
  On Wed, Jul 20, 2011 at 9:21 AM, Sowmya V.B. vbsow...@gmail.com wrote:
 
   Hi Tomasso
  
   Thanks for a quick response.
  
   So, if I say:
   http://localhost:8085/apache-solr-3.3.0/select?indent=onversion=2.2*
  
 defType=dismax*q=scientificbq=Field1:[20%20TO%2025]^10start=0rows=30
   -will it be right?
  
   The above query: boosts the documents which suit the given query
   (scientific), which has Field1 values between 20-25, by a factor of
 10
  :
   Is that right??
  
   S
  
   2011/7/20 Tomás Fernández Löbbe tomasflo...@gmail.com
  
Hi Sowmya, bq is a great way of boosting, but you have to be using
  the
Dismax Query Parser or the Extended Dismax (edismax) query parser, it
doesn't work with the Lucene Query Parser. If you can use any of
 those,
then
that's the solution. If you need to use the Lucene Query Parser, for
 a
   user
query like:
   
scientific temper
   
you could create a query like:
   
(scientific temper) OR (scientific temper AND (field1:[10 TO
 2030]))^X
   
being X the boost you want for those documents.
   
with your query:
scientific temper field1:[10 TO 2030]
   
you are either adding the condition of the range value for the field
  (if
your default operator is AND) or adding another way of matching the
  query
(if your default operator ir OR, you can have documents in your
 result
   set
that only matched the range query, and this is not what the user
  wanted).
   
Hope this helps,
   
Tomás
   
On Wed, Jul 20, 2011 at 5:15 AM, Sowmya V.B. vbsow...@gmail.com
  wrote:
   
 Can anyone throw some light on this issue?

 My problem is to: give a query time boost to certain documents,
 which
have
 a
 field, say field1, in the range that the user chooses during query
   time.
I
 think the below link indicates a range query:



   
  
 
 

Re: How can i find a document by a special id?

2011-07-20 Thread Kyle Lee
Is the mediacode always alphabetic, and is the ID always numeric?


Schema design/data import

2011-07-20 Thread Travis Low
Greetings.  I am struggling to design a schema and a data import/update
strategy for some semi-complicated data.  I would appreciate any input.

What we have is a bunch of database records that may or may not have files
attached.  Sometimes no files, sometimes 50.

The requirement is to index the database records AND the documents, and the
search results would be just links to the database records.

I'd love to crawl the site with Nutch and be done with it, but we have a
complicated search form with various codes and attributes for the database
records, so we need a detailed schema that will loosely correspond to boxes
on the search form.  I don't think we could easily do that if we just crawl
the site.  But with a detailed schema, I'm having trouble understanding how
we could import and index from the database, and also index the related
files, and have the same schema being populated, especially with the number
of related documents being variable (maybe index them all to one field?).

We have a lot of flexibility on how we can build this, so I'm open to any
suggestions or pointers for further reading.  I've spent a fair amount of
time on the wiki but I didn't see anything that seemed directly relevant.

An additional difficulty, that I am willing to overlook for the first cut,
is that some of these files are zipped, and some of the zip files may
contain other zip files, to maybe 3 or 4 levels deep.

Help, please?

cheers,

Travis



-- 

**

*Travis Low, Director of Development*


** t...@4centurion.com* *

*Centurion Research Solutions, LLC*

*14048 ParkEast Circle *•* Suite 100 *•* Chantilly, VA 20151*

*703-956-6276 *•* 703-378-4474 (fax)*

*http://www.centurionresearch.com* http://www.centurionresearch.com

**The information contained in this email message is confidential and
protected from disclosure.  If you are not the intended recipient, any use
or dissemination of this communication, including attachments, is strictly
prohibited.  If you received this email message in error, please delete it
and immediately notify the sender.

This email message and any attachments have been scanned and are believed to
be free of malicious software and defects that might affect any computer
system in which they are received and opened. No responsibility is accepted
by Centurion Research Solutions, LLC for any loss or damage arising from the
content of this email.


Re: How can i find a document by a special id?

2011-07-20 Thread Per Newgro

Am 20.07.2011 19:23, schrieb Kyle Lee:

Is the mediacode always alphabetic, and is the ID always numeric?


No sadly not. We expose our products on too many medias :-).

Per


Re: Geospatial queries in Solr

2011-07-20 Thread Jamie Johnson
So I've pulled the latest and can run the example, I've tried to move
my config over and am having a bit of an issue when executing queries,
specifically I get this:

Unable to read: POLYGON((...

looking at the code it's usign the simple spatial context, how do I
specify JtsSpatialContext?

On Wed, Jul 20, 2011 at 12:13 PM, Jamie Johnson jej2...@gmail.com wrote:
 Thanks for the update David, I'll give that a try now.

 On Wed, Jul 20, 2011 at 10:58 AM, Smiley, David W. dsmi...@mitre.org wrote:
 Ryan just updated LSP for Lucene/Solr trunk compatibility so you should do a 
 mvn clean install and you'll be back in business.

 On Jul 20, 2011, at 10:37 AM, Jamie Johnson wrote:

 Thanks for responding so quickly, I don't mind waiting a bit.  I'll
 hang out until the updates have been  made.  Thanks again.

 On Tue, Jul 19, 2011 at 3:51 PM, Smiley, David W. dsmi...@mitre.org wrote:
 Hi Jamie.
 I work on LSP; it can index polygons and query for them. Although the 
 capability is there, we have more testing  benchmarking to do, and then 
 we need to put together a tutorial to explain how to use it at the Solr 
 layer.  I recently cleaned up the READMEs a bit.  Try downloading the 
 trunk codebase, and follow the README.  It points to another README which 
 shows off a demo webapp.  At the conclusion of this, you'll need to 
 examine the tests and webapp a bit to figure out how to apply it in your 
 app.  We don't yet have a tutorial as the framework has been in flux  
 although it has stabilized a good deal.

 Oh... by the way, this works off of Lucene/Solr trunk.  Within the past 
 week there was a major change to trunk and LSP won't compile until we make 
 updates.  Either Ryan McKinley or I will get to that by the end of the 
 week.  So unless you have access to 2-week old maven artifacts of 
 Lucene/Solr, you're stuck right now.

 ~ David Smiley
 Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/

 On Jul 19, 2011, at 3:03 PM, Jamie Johnson wrote:

 I have looked at the code being shared on the
 lucene-spatial-playground and was wondering if anyone could provide
 some details as to its state.  Specifically I'm looking to add
 geospatial support to my application based on a user provided polygon,
 is this currently possible using this extension?












Re: Wiki Error JSON syntax

2011-07-20 Thread Remy Loubradou
I think I can trust you but this is weird.
Funny things if you try to validate on http://jsonlint.com/ this JSON,
duplicates keys are automatically removed. But the thing is, how can you
possibly generate this json with Javascript Object?

It will be really nice to combine both ways that you show on the page.
Something like:

{
add: [
{
doc: {
id: DOC1,
my_boosted_field: {
boost: 2.3,
value: test
},
my_multivalued_field: [
aaa,
bbb
]
}
},
{
commitWithin: 5000,
overwrite: false,
boost: 3.45,
doc: {
f1: v2
}
}
],
commit: {},
optimize: {
waitFlush: false,
waitSearcher: false
},
delete: [
{
id: ID
},
{
query: QUERY
}
]
}

Thanks you for you previous response Yonik.

2011/7/20 Yonik Seeley yo...@lucidimagination.com

 On Wed, Jul 20, 2011 at 12:16 PM, Remy Loubradou
 remyloubra...@gmail.com wrote:
  Hi,
  I was writing a Solr Client API for Node and I found an error on this
 page
  http://wiki.apache.org/solr/UpdateJSON ,on the section Update Commands
 the
  JSON is not valid because there are duplicate keys and two times with
 add
  and delete.

 It's a common misconception that it's invalid JSON.  Duplicate keys
 are in fact legal.

 -Yonik
 http://www.lucidimagination.com

 I tried with an array and it doesn't work as well, I got error
  400, I think that's because the syntax is bad.
 
  I don't really know if I am at the good place to talk about that but ...
  that the only place I found. Sorry if it's not.
 
  Thanks,
 
  And I love Solr :)
 



Re: Geospatial queries in Solr

2011-07-20 Thread Smiley, David W.
You can set the system property SpatialContextProvider to 
com.googlecode.lucene.spatial.base.context.JtsSpatialContext

~ David

On Jul 20, 2011, at 2:02 PM, Jamie Johnson wrote:

 So I've pulled the latest and can run the example, I've tried to move
 my config over and am having a bit of an issue when executing queries,
 specifically I get this:
 
 Unable to read: POLYGON((...
 
 looking at the code it's usign the simple spatial context, how do I
 specify JtsSpatialContext?
 
 On Wed, Jul 20, 2011 at 12:13 PM, Jamie Johnson jej2...@gmail.com wrote:
 Thanks for the update David, I'll give that a try now.
 
 On Wed, Jul 20, 2011 at 10:58 AM, Smiley, David W. dsmi...@mitre.org wrote:
 Ryan just updated LSP for Lucene/Solr trunk compatibility so you should do 
 a mvn clean install and you'll be back in business.
 
 On Jul 20, 2011, at 10:37 AM, Jamie Johnson wrote:
 
 Thanks for responding so quickly, I don't mind waiting a bit.  I'll
 hang out until the updates have been  made.  Thanks again.
 
 On Tue, Jul 19, 2011 at 3:51 PM, Smiley, David W. dsmi...@mitre.org 
 wrote:
 Hi Jamie.
 I work on LSP; it can index polygons and query for them. Although the 
 capability is there, we have more testing  benchmarking to do, and then 
 we need to put together a tutorial to explain how to use it at the Solr 
 layer.  I recently cleaned up the READMEs a bit.  Try downloading the 
 trunk codebase, and follow the README.  It points to another README which 
 shows off a demo webapp.  At the conclusion of this, you'll need to 
 examine the tests and webapp a bit to figure out how to apply it in your 
 app.  We don't yet have a tutorial as the framework has been in flux  
 although it has stabilized a good deal.
 
 Oh... by the way, this works off of Lucene/Solr trunk.  Within the past 
 week there was a major change to trunk and LSP won't compile until we 
 make updates.  Either Ryan McKinley or I will get to that by the end of 
 the week.  So unless you have access to 2-week old maven artifacts of 
 Lucene/Solr, you're stuck right now.
 
 ~ David Smiley
 Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/
 
 On Jul 19, 2011, at 3:03 PM, Jamie Johnson wrote:
 
 I have looked at the code being shared on the
 lucene-spatial-playground and was wondering if anyone could provide
 some details as to its state.  Specifically I'm looking to add
 geospatial support to my application based on a user provided polygon,
 is this currently possible using this extension?
 
 
 
 
 
 
 
 
 
 



Re: Geospatial queries in Solr

2011-07-20 Thread Jamie Johnson
Where do you set that?

On Wed, Jul 20, 2011 at 2:37 PM, Smiley, David W. dsmi...@mitre.org wrote:
 You can set the system property SpatialContextProvider to 
 com.googlecode.lucene.spatial.base.context.JtsSpatialContext

 ~ David

 On Jul 20, 2011, at 2:02 PM, Jamie Johnson wrote:

 So I've pulled the latest and can run the example, I've tried to move
 my config over and am having a bit of an issue when executing queries,
 specifically I get this:

 Unable to read: POLYGON((...

 looking at the code it's usign the simple spatial context, how do I
 specify JtsSpatialContext?

 On Wed, Jul 20, 2011 at 12:13 PM, Jamie Johnson jej2...@gmail.com wrote:
 Thanks for the update David, I'll give that a try now.

 On Wed, Jul 20, 2011 at 10:58 AM, Smiley, David W. dsmi...@mitre.org 
 wrote:
 Ryan just updated LSP for Lucene/Solr trunk compatibility so you should do 
 a mvn clean install and you'll be back in business.

 On Jul 20, 2011, at 10:37 AM, Jamie Johnson wrote:

 Thanks for responding so quickly, I don't mind waiting a bit.  I'll
 hang out until the updates have been  made.  Thanks again.

 On Tue, Jul 19, 2011 at 3:51 PM, Smiley, David W. dsmi...@mitre.org 
 wrote:
 Hi Jamie.
 I work on LSP; it can index polygons and query for them. Although the 
 capability is there, we have more testing  benchmarking to do, and then 
 we need to put together a tutorial to explain how to use it at the Solr 
 layer.  I recently cleaned up the READMEs a bit.  Try downloading the 
 trunk codebase, and follow the README.  It points to another README 
 which shows off a demo webapp.  At the conclusion of this, you'll need 
 to examine the tests and webapp a bit to figure out how to apply it in 
 your app.  We don't yet have a tutorial as the framework has been in 
 flux  although it has stabilized a good deal.

 Oh... by the way, this works off of Lucene/Solr trunk.  Within the past 
 week there was a major change to trunk and LSP won't compile until we 
 make updates.  Either Ryan McKinley or I will get to that by the end of 
 the week.  So unless you have access to 2-week old maven artifacts of 
 Lucene/Solr, you're stuck right now.

 ~ David Smiley
 Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/

 On Jul 19, 2011, at 3:03 PM, Jamie Johnson wrote:

 I have looked at the code being shared on the
 lucene-spatial-playground and was wondering if anyone could provide
 some details as to its state.  Specifically I'm looking to add
 geospatial support to my application based on a user provided polygon,
 is this currently possible using this extension?














Culr Tika not working with blanks into literal.field

2011-07-20 Thread Peralta Gutiérrez del Álamo





Hi. 
I'm trying to index binary documents with curl and Tika for extracting text. 

The problem  is that when I set the value of a field with spaces blanks using 
the input parameter: literal.fieldname=value, the document is not indexed. 

The sentence I send is the follow: 

curl 
http://localhost:8983/solr/update/extract?literal.id=doc1\literal.url=/mnt/windows/Ofertas/2006
 Portal 
Intranet/DOCUMENTACION/datos.doc\uprefix=attr_\fmap.content=text\commit=true
 -F myfile=\@/mnt/windows/Ofertas/DOCUMENTACION/datos.doc 


That is literal.url=value with blanks apparently is not working 
  

Re: Geospatial queries in Solr

2011-07-20 Thread Smiley, David W.
The notion of a system property is a java concept; google it and you'll learn 
more.

BTW, despite my responsiveness in helping right now; I'm pretty busy this week 
so this won't necessarily last long.
~ David

On Jul 20, 2011, at 2:43 PM, Jamie Johnson wrote:

 Where do you set that?
 
 On Wed, Jul 20, 2011 at 2:37 PM, Smiley, David W. dsmi...@mitre.org wrote:
 You can set the system property SpatialContextProvider to 
 com.googlecode.lucene.spatial.base.context.JtsSpatialContext
 
 ~ David
 
 On Jul 20, 2011, at 2:02 PM, Jamie Johnson wrote:
 
 So I've pulled the latest and can run the example, I've tried to move
 my config over and am having a bit of an issue when executing queries,
 specifically I get this:
 
 Unable to read: POLYGON((...
 
 looking at the code it's usign the simple spatial context, how do I
 specify JtsSpatialContext?
 
 On Wed, Jul 20, 2011 at 12:13 PM, Jamie Johnson jej2...@gmail.com wrote:
 Thanks for the update David, I'll give that a try now.
 
 On Wed, Jul 20, 2011 at 10:58 AM, Smiley, David W. dsmi...@mitre.org 
 wrote:
 Ryan just updated LSP for Lucene/Solr trunk compatibility so you should 
 do a mvn clean install and you'll be back in business.
 
 On Jul 20, 2011, at 10:37 AM, Jamie Johnson wrote:
 
 Thanks for responding so quickly, I don't mind waiting a bit.  I'll
 hang out until the updates have been  made.  Thanks again.
 
 On Tue, Jul 19, 2011 at 3:51 PM, Smiley, David W. dsmi...@mitre.org 
 wrote:
 Hi Jamie.
 I work on LSP; it can index polygons and query for them. Although the 
 capability is there, we have more testing  benchmarking to do, and 
 then we need to put together a tutorial to explain how to use it at the 
 Solr layer.  I recently cleaned up the READMEs a bit.  Try downloading 
 the trunk codebase, and follow the README.  It points to another README 
 which shows off a demo webapp.  At the conclusion of this, you'll need 
 to examine the tests and webapp a bit to figure out how to apply it in 
 your app.  We don't yet have a tutorial as the framework has been in 
 flux  although it has stabilized a good deal.
 
 Oh... by the way, this works off of Lucene/Solr trunk.  Within the past 
 week there was a major change to trunk and LSP won't compile until we 
 make updates.  Either Ryan McKinley or I will get to that by the end of 
 the week.  So unless you have access to 2-week old maven artifacts of 
 Lucene/Solr, you're stuck right now.
 
 ~ David Smiley
 Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/
 
 On Jul 19, 2011, at 3:03 PM, Jamie Johnson wrote:
 
 I have looked at the code being shared on the
 lucene-spatial-playground and was wondering if anyone could provide
 some details as to its state.  Specifically I'm looking to add
 geospatial support to my application based on a user provided polygon,
 is this currently possible using this extension?
 
 
 
 
 
 
 
 
 
 
 
 



RE: embeded solrj doesn't refresh index

2011-07-20 Thread Jianbin Dai
Hi Thanks for response. Here is the whole picture:
I use DIH to import and index data. And use embedded solrj connecting to the
index file for search and other operations.
Here is what I found: Once data are indexed (and committed), I can see the
changes through solr web server, but not from embedded solrj. If I restart
the embedded solr server, I do see the changes.
Hope it helps. Thanks.


-Original Message-
From: Marco Martinez [mailto:mmarti...@paradigmatecnologico.com] 
Sent: Wednesday, July 20, 2011 5:09 AM
To: solr-user@lucene.apache.org
Subject: Re: embeded solrj doesn't refresh index

You should send a commit to you embedded solr

Marco Martínez Bautista
http://www.paradigmatecnologico.com
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón
Tel.: 91 352 59 42


2011/7/20 Jianbin Dai j...@huawei.com

 Hi,



 I am using embedded solrj. After I add new doc to the index, I can see the
 changes through solr web, but not from embedded solrj. But after I restart
 the embedded solrj, I do see the changes. It works as if there was a
cache.
 Anyone knows the problem? Thanks.



 Jianbin





set queryNorm to 1?

2011-07-20 Thread Elaine Li
Hi Folks,

My boost function bf=div(product(num_clicks,0.3),sum(num_clicks,25))
I would like to directly add the score of it to the final scoring instead of
letting it be normalized by the queryNorm value.
Is there anyway to do it?

Thanks.

Elaine


Re: query time boosting in solr

2011-07-20 Thread Sowmya V.B.
Hi Tomas

Yeah, I now understand it. I was confused about interpreting the output.

Thanks for the comments.

Sowmya.

2011/7/20 Tomás Fernández Löbbe tomasflo...@gmail.com

 So, what you want is to have the same exact results set as if the query was
 scientific, but the documents that also match Field1:[20 TO 30] to have
 more score, right?

 On Wed, Jul 20, 2011 at 10:53 AM, Sowmya V.B. vbsow...@gmail.com wrote:

  Hi Tomas
 
  Here is what I was trying to give.
 
 
 
 http://localhost:8085/apache-solr-3.3.0/select?indent=onversion=2.2defType=dismaxq=scientificbq=Field1:[20%20TO%2030]
  ^10start=0rows=30qf=textfl=Field1,dociddebugQuery=on
 

 This query seems OK for that purpose.

 
  Over here, I was trying to change the range of Field1, keeping everything
  else intact. Here are my observations:
 
  1) The number of results found remain intact. Only that the order of the
  results varies.
 
 Isn't this what was expected?


  2) The boost factor (10) does not seem to throw any influence at all.
 
 It's on the parsed query, Why do you think it doesn't have an influence?
 Can
 you send the debug query output for a document that match the bt?
 I tried it with the Solr example and this is what I see:


 http://localhost:8983/solr/select?defType=dismaxq=displaybq=weight:[0%20TO%2010]
 ^10start=0rows=30debugQuery=onqf=features%20name

 This is the debug output for a document that match the query and the boost
 query:

 str name=MA147LL/A

 1.137027 = (MATCH) sum of:
  0.1994111 = (MATCH) max of:
0.1994111 = (MATCH) weight(features:display in 0), product of:
  0.34767273 = queryWeight(features:display), product of:
3.7080503 = idf(docFreq=1, maxDocs=30)
0.0937616 = queryNorm
  0.57355976 = (MATCH) fieldWeight(features:display in 0), product of:
1.4142135 = tf(termFreq(features:display)=2)
3.7080503 = idf(docFreq=1, maxDocs=30)
0.109375 = fieldNorm(field=features, doc=0)
  0.937616 = (MATCH) ConstantScore(weight:[0.0 TO 10.0]^10.0)^10.0, product
 of:
10.0 = boost
0.0937616 = queryNorm
 /str

 and this is the debug output for a document that only match the main query:

 str name=VA902B
 0.4834455 = (MATCH) sum of:
  0.4834455 = (MATCH) max of:
0.4834455 = (MATCH) weight(name:display in 12), product of:
  0.34767273 = queryWeight(name:display), product of:
3.7080503 = idf(docFreq=1, maxDocs=30)
0.0937616 = queryNorm
  1.3905189 = (MATCH) fieldWeight(name:display in 12), product of:
1.0 = tf(termFreq(name:display)=1)
3.7080503 = idf(docFreq=1, maxDocs=30)
0.375 = fieldNorm(field=name, doc=12)
 /str

 Do you have something similar??



  Here is what the debugQuery says:
  str name=parsedquery+DisjunctionMaxQuery((text:scientif)) ()
  Field1:[20.0 TO 30.0]^10.0/str
  str name=parsedquery_toString+(text:scientif) () Field1:[20.0 TO
  30.0]^10.0/str
 
  From these, it seems like its just filtering the results based on the
  Field1
  values, rather than performing a Boost Query.
 
  S.
 
  2011/7/20 Tomás Fernández Löbbe tomasflo...@gmail.com
 
   Yes, it should,  but make sure you specify at least the qf parameter
  for
   dismax. You can activate debugQuery and you'll see which documents get
   boosted and which aren't.
  
   On Wed, Jul 20, 2011 at 9:21 AM, Sowmya V.B. vbsow...@gmail.com
 wrote:
  
Hi Tomasso
   
Thanks for a quick response.
   
So, if I say:
   
 http://localhost:8085/apache-solr-3.3.0/select?indent=onversion=2.2*
   
  defType=dismax*q=scientificbq=Field1:[20%20TO%2025]^10start=0rows=30
-will it be right?
   
The above query: boosts the documents which suit the given query
(scientific), which has Field1 values between 20-25, by a factor of
  10
   :
Is that right??
   
S
   
2011/7/20 Tomás Fernández Löbbe tomasflo...@gmail.com
   
 Hi Sowmya, bq is a great way of boosting, but you have to be
 using
   the
 Dismax Query Parser or the Extended Dismax (edismax) query parser,
 it
 doesn't work with the Lucene Query Parser. If you can use any of
  those,
 then
 that's the solution. If you need to use the Lucene Query Parser,
 for
  a
user
 query like:

 scientific temper

 you could create a query like:

 (scientific temper) OR (scientific temper AND (field1:[10 TO
  2030]))^X

 being X the boost you want for those documents.

 with your query:
 scientific temper field1:[10 TO 2030]

 you are either adding the condition of the range value for the
 field
   (if
 your default operator is AND) or adding another way of matching the
   query
 (if your default operator ir OR, you can have documents in your
  result
set
 that only matched the range query, and this is not what the user
   wanted).

 Hope this helps,

 Tomás

 On Wed, Jul 20, 2011 at 5:15 AM, Sowmya V.B. vbsow...@gmail.com
   wrote:

  Can anyone throw some light on this 

Schema Design/Data Import

2011-07-20 Thread travis

[Apologies if this is a duplicate -- I have sent several messages from my work 
email and they just vanish, so I subscribed with my personal email]
 
Greetings.  I am struggling to design a schema and a data import/update  
strategy for some semi-complicated data.  I would appreciate any input.

What we have is a bunch of database records that may or may not have files 
attached.  Sometimes no files, sometimes 50.
 
The requirement is to index the database records AND the documents,  and the 
search results would be just links to the database records.

I'd  love to crawl the site with Nutch and be done with it, but we have a  
complicated search form with various codes and attributes for the  database 
records, so we need a detailed schema that will loosely  correspond to boxes on 
the search form.  I don't think we could easily  do that if we just crawl the 
site.  But with a detailed schema, I'm  having trouble understanding how we 
could import and index from the  database, and also index the related files, 
and have the same schema  being populated, especially with the number of 
related documents being  variable (maybe index them all to one field?).
 
We have a lot of flexibility on how we can build this, so I'm open  to any 
suggestions or pointers for further reading.  I've spent a fair  amount of time 
on the wiki but I didn't see anything that seemed  directly relevant.
 
An additional difficulty, that I am willing to overlook for the  first cut, is 
that some of these files are zipped, and some of the zip  files may contain 
other zip files, to maybe 3 or 4 levels deep.  

Help, please?
 
cheers,

Travis

Re: How can i find a document by a special id?

2011-07-20 Thread Chris Hostetter

: Am 20.07.2011 19:23, schrieb Kyle Lee:
:  Is the mediacode always alphabetic, and is the ID always numeric?
:  
: No sadly not. We expose our products on too many medias :-).

If i'm understanding you correctly, you're saying even the prefix AB is 
not special, that there could be any number of prefixes identifying 
differnet mediacodes ? and the product ids aren't all numeric?

your question seems  absurd.  

I can only assume that I am horribly missunderstanding your situation.  
(which is very easy to do when you only have a single contrieved piece of 
example data to go on)

As a general rule, it's not a good idea to think about Solr in the same 
way as a relational database, but Perhaps if you imagine for a moment that 
your Solr index *was* a (read only) relational database, with each 
solr field corrisponding to a column in your DB, and then you described in 
psuedo-code/sql how you would go about doing the types of id lookups you 
want to do, it might give us a better idea of your situation so we can 
suggest an approach for dealing with it.


-Hoss


Re: Tokenizer Question

2011-07-20 Thread Chris Hostetter

When the QueryParser gives hunks of text to an analyzer, and that analyzer 
produces multiple terms, the query parser has to decide how to build a 
query out of it.

if the terms have identicle position information, then it always builds an 
OR query (this is the typical synonym situation).  If the terms have 
differing positions, then the behavior is driven by the 
autoGeneratePhraseQueries attribute of the FieldType -- the default value 
of this depends on the version attribute of your top level schema/ tag.


: I have a query which starts out with something like name:john, I
: need to expand this to something like name:(john johnny).  I've
: implemented a custom tokenzier which gets close, but isn't quite right
: it outputs name:john johnny.  Is there a simple example of doing
: what I'm attempting?
: 

-Hoss


RE: defType argument weirdness

2011-07-20 Thread Chris Hostetter

: I do understand what they do (at least well enough to use them), but I 
: find it confusing that it's called defType as a main param, but type 
: in a LocalParam, when to me they both seem to do the same thing -- which 

type as a localparam in a query string defines the type of query string 
it is -- picking hte parser.

defType determins the default value for type in the primary query 
string.

: (and then there's 'qt', often confused with defType/type by newbies, 
: since they guess it stands for 'query type', but which should probably 
: actually have been called 'requestHandler'/'rh' instead, since that's 
: what it actually chooses, no?  It gets very confusing).
: 
: If it's generally recognized it's confusing and perhaps a somewhat 
: inconsistent mental model being implied, I wonder if there'd be any 
: interest in renaming these to be more clear, leaving the old ones as 
: aliases/synonyms for backwards compatibility (perhaps with a long 

qt is historic and already being de-emphasized in favor of using 
path based names (ie: http://solr/handlername instead of 
http://solr/select?qt=/handlername) so adding yet another alias for that 
would be moving in the wrong direction.

type and defType probably make more sense when you think of 
them in that order.  I don't see a strong need to confuse/complicate the 
issue by adding more aliases for them.



-Hoss


Re: defType argument weirdness

2011-07-20 Thread Jonathan Rochkind
Huh, I'm still not completely following. I'm sure it makes sense if you 
understand the underlying implemetnation, but I don't understand how 
'type' and 'defType' don't mean exactly the same thing, just need to be 
expressed differently in different location.


Sorry for beating a dead horse, but maybe it would help if you could 
tell me what I'm getting wrong here:


defType can only go in top-level param, and determines the query parser 
for the overall q top level param.


type can only go  in a LocalParam, and determines the query parser that 
applies to whatever query (top-level or nested) that the LocalParam 
syntax lives in.  (Just as any other LocalParams apply only to the query 
that the LocalParam block lives in -- and nested queries inherit their 
query parser from the query they are nested in unless over-ridden, just 
as they inherit every other param from the query they are nested in 
unless over-ridden, nothing special here).


Therefore for instance:

defType=dismaxq=foo

is equivalent to

defType=luceneq={!type=dismax}foo


Where am I straying in my mental model here? Because if all that is 
true, I don't understand how 'type' and 'defType' mean anything 
different -- they both choose the query parser, do they not? (which to 
me means I wish they were both called 'parser' instead of 'type' -- a 
'type' here is the name of a query parser, is it not?)  It's just that 
if it's in the top-level param you have to use 'defType', and if it's in 
a LocalParam you have to use 'type'.  That's been my mental model, which 
has served me well so far, but if it's wrong and it's going to trip me 
up on some as yet unencountered use cases, it would probably be good for 
me to know it!  (And probably good for some documentation to be written 
somewhere explaining it too). (And if they really are different, 
prefixing def to type is not making it very clear what the 
difference is! What's def supposed to stand for anyway?)


Jonathan


On 7/20/2011 3:49 PM, Chris Hostetter wrote:

: I do understand what they do (at least well enough to use them), but I
: find it confusing that it's called defType as a main param, but type
: in a LocalParam, when to me they both seem to do the same thing -- which

type as a localparam in a query string defines the type of query string
it is -- picking hte parser.

defType determins the default value for type in the primary query
string.

: (and then there's 'qt', often confused with defType/type by newbies,
: since they guess it stands for 'query type', but which should probably
: actually have been called 'requestHandler'/'rh' instead, since that's
: what it actually chooses, no?  It gets very confusing).
:
: If it's generally recognized it's confusing and perhaps a somewhat
: inconsistent mental model being implied, I wonder if there'd be any
: interest in renaming these to be more clear, leaving the old ones as
: aliases/synonyms for backwards compatibility (perhaps with a long

qt is historic and already being de-emphasized in favor of using
path based names (ie: http://solr/handlername instead of
http://solr/select?qt=/handlername) so adding yet another alias for that
would be moving in the wrong direction.

type and defType probably make more sense when you think of
them in that order.  I don't see a strong need to confuse/complicate the
issue by adding more aliases for them.



-Hoss



Re: Tokenizer Question

2011-07-20 Thread Jamie Johnson
Thanks, I'll try that now, I'm assuming I need to add the position
increment and offset attributes?

On Wed, Jul 20, 2011 at 3:44 PM, Chris Hostetter
hossman_luc...@fucit.org wrote:

 When the QueryParser gives hunks of text to an analyzer, and that analyzer
 produces multiple terms, the query parser has to decide how to build a
 query out of it.

 if the terms have identicle position information, then it always builds an
 OR query (this is the typical synonym situation).  If the terms have
 differing positions, then the behavior is driven by the
 autoGeneratePhraseQueries attribute of the FieldType -- the default value
 of this depends on the version attribute of your top level schema/ tag.


 : I have a query which starts out with something like name:john, I
 : need to expand this to something like name:(john johnny).  I've
 : implemented a custom tokenzier which gets close, but isn't quite right
 : it outputs name:john johnny.  Is there a simple example of doing
 : what I'm attempting?
 :

 -Hoss



solrj and XML result sets

2011-07-20 Thread Joe Shubitowski
Does anyone have advice as to how to produce an XML result set using SolrJ?? 
My Java coder says he can *only* produce result sets in javabin - which is fine 
in most cases - but we have a need for an XML output stream as well.

Thanks...


RE: Solr 3.3: Exception in thread Lucene Merge Thread #1

2011-07-20 Thread Robert Petersen
Says it is caused by a Java out of memory error, no?  

-Original Message-
From: mdz-munich [mailto:sebastian.lu...@bsb-muenchen.de] 
Sent: Wednesday, July 20, 2011 9:18 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr 3.3: Exception in thread Lucene Merge Thread #1

Here we go ...

This time we tried to use the old LogByteSizeMergePolicy and
SerialMergeScheduler:

mergePolicy class=org.apache.lucene.index.LogByteSizeMergePolicy/
mergeScheduler class=org.apache.lucene.index.SerialMergeScheduler/

We did this before, just to be sure ... 

~300 Documents:

/
SEVERE: java.io.IOException: Map failed
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:782)
at
org.apache.lucene.store.MMapDirectory$MMapIndexInput.init(MMapDirector
y.java:264)
at
org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:216)
at
org.apache.lucene.index.FieldsReader.init(FieldsReader.java:129)
at
org.apache.lucene.index.SegmentCoreReaders.openDocStores(SegmentCoreRead
ers.java:244)
at
org.apache.lucene.index.SegmentReader.get(SegmentReader.java:116)
at
org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:702)
at
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4192)
at
org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3859)
at
org.apache.lucene.index.SerialMergeScheduler.merge(SerialMergeScheduler.
java:37)
at
org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2714)
at
org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2709)
at
org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2705)
at
org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3509)
at
org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1850)
at
org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1814)
at
org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1778)
at
org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:143)
at
org.apache.solr.update.DirectUpdateHandler2.closeWriter(DirectUpdateHand
ler2.java:183)
at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.
java:416)
at
org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpd
ateProcessorFactory.java:85)
at
org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:98)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Conte
ntStreamHandlerBase.java:67)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB
ase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.ja
va:356)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j
ava:252)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applica
tionFilterChain.java:243)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilt
erChain.java:210)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValv
e.java:240)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValv
e.java:164)
at
org.apache.catalina.authenticator.AuthenticatorBase.invoke(Authenticator
Base.java:462)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java
:164)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java
:100)
at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:563
)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.
java:118)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:4
03)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:30
1)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(
Http11Protocol.java:162)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(
Http11Protocol.java:140)
at
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.j
ava:309)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecuto
r.java:897)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.ja
va:919)
at java.lang.Thread.run(Thread.java:736)
Caused by: java.lang.OutOfMemoryError: Map failed
at sun.nio.ch.FileChannelImpl.map0(Native Method)
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:779)
... 44 more

20.07.2011 18:07:30 org.apache.solr.core.SolrCore execute
INFO: [core.digi20] webapp=/solr path=/update params={} status=500
QTime=12302 
20.07.2011 18:07:30 org.apache.solr.common.SolrException log
SEVERE: 

Re: How can i find a document by a special id?

2011-07-20 Thread Bill Bell
Why not just search the 2 fields?

q=*:*fq=mediacode:AB OR id:123456

You could take the user input and replace it:

q=*:*fq=mediacode:$input OR id:$input

Of course you can also use dismax and wrap with an OR.

Bill Bell
Sent from mobile


On Jul 20, 2011, at 3:38 PM, Chris Hostetter hossman_luc...@fucit.org wrote:

 
 : Am 20.07.2011 19:23, schrieb Kyle Lee:
 :  Is the mediacode always alphabetic, and is the ID always numeric?
 :  
 : No sadly not. We expose our products on too many medias :-).
 
 If i'm understanding you correctly, you're saying even the prefix AB is 
 not special, that there could be any number of prefixes identifying 
 differnet mediacodes ? and the product ids aren't all numeric?
 
 your question seems  absurd.  
 
 I can only assume that I am horribly missunderstanding your situation.  
 (which is very easy to do when you only have a single contrieved piece of 
 example data to go on)
 
 As a general rule, it's not a good idea to think about Solr in the same 
 way as a relational database, but Perhaps if you imagine for a moment that 
 your Solr index *was* a (read only) relational database, with each 
 solr field corrisponding to a column in your DB, and then you described in 
 psuedo-code/sql how you would go about doing the types of id lookups you 
 want to do, it might give us a better idea of your situation so we can 
 suggest an approach for dealing with it.
 
 
 -Hoss


RE: embeded solrj doesn't refresh index

2011-07-20 Thread Jianbin Dai
Hi Thanks for response. Here is the whole picture:
I use DIH to import and index data. And use embedded solrj connecting to the
index file for search and other operations.
Here is what I found: Once data are indexed (and committed), I can see the
changes through solr web server, but not from embedded solrj. If I restart
the embedded solr server, I do see the changes.
Hope it helps. Thanks.

-Original Message-
From: Marco Martinez [mailto:mmarti...@paradigmatecnologico.com] 
Sent: Wednesday, July 20, 2011 5:09 AM
To: solr-user@lucene.apache.org
Subject: Re: embeded solrj doesn't refresh index

You should send a commit to you embedded solr

Marco Martínez Bautista
http://www.paradigmatecnologico.com
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón
Tel.: 91 352 59 42


2011/7/20 Jianbin Dai j...@huawei.com

 Hi,



 I am using embedded solrj. After I add new doc to the index, I can see the
 changes through solr web, but not from embedded solrj. But after I restart
 the embedded solrj, I do see the changes. It works as if there was a
cache.
 Anyone knows the problem? Thanks.



 Jianbin





Re: Data Import from a Queue

2011-07-20 Thread Bill Bell
Yes this is a good reason for using a queue. I have used Amazon SQS this way 
and it was simple to set up.

Bill Bell
Sent from mobile


On Jul 20, 2011, at 2:59 AM, Stefan Matheis matheis.ste...@googlemail.com 
wrote:

 Brandon,
 
 i don't know how they are using it in detail, but Part of Chef's Architecture 
 is this one:
 
 Chef Server - RabbitMQ - Chef Solr Indexer - Solr
 http://wiki.opscode.com/download/attachments/7274878/chef-server-arch.png
 
 Perhaps not exactly, what you're looking for - but may give you an idea?
 
 Regards
 Stefan
 
 Am 19.07.2011 19:04, schrieb Brandon Fish:
 Let me provide some more details to the question:
 
 I was unable to find any example implementations where individual documents
 (single document per message) are read from a message queue (like ActiveMQ
 or RabbitMQ) and then added to Solr via SolrJ, a HTTP POST or another
 method. Does anyone know of any available examples for this type of import?
 
 If no examples exist, what would be a recommended commit strategy for
 performance? My best guess for this would be to have a queue per core and
 commit once the queue is empty.
 
 Thanks.
 
 On Mon, Jul 18, 2011 at 6:52 PM, Erick 
 Ericksonerickerick...@gmail.comwrote:
 
 This is a really cryptic problem statement.
 
 you might want to review:
 
 http://wiki.apache.org/solr/UsingMailingLists
 
 Best
 Erick
 
 On Fri, Jul 15, 2011 at 1:52 PM, Brandon Fishbrandon.j.f...@gmail.com
 wrote:
 Does anyone know of any existing examples of importing data from a queue
 into Solr?
 
 Thank you.
 
 
 


Re: Geospatial queries in Solr

2011-07-20 Thread Jamie Johnson
Thanks David.  When trying to execute queries on a complex irregular
polygon (say the shape of NJ) I'm getting results which are actually
outside of that polygon. Is there a setting which controls this
resolution?

On Wed, Jul 20, 2011 at 2:53 PM, Smiley, David W. dsmi...@mitre.org wrote:
 The notion of a system property is a java concept; google it and you'll 
 learn more.

 BTW, despite my responsiveness in helping right now; I'm pretty busy this 
 week so this won't necessarily last long.
 ~ David

 On Jul 20, 2011, at 2:43 PM, Jamie Johnson wrote:

 Where do you set that?

 On Wed, Jul 20, 2011 at 2:37 PM, Smiley, David W. dsmi...@mitre.org wrote:
 You can set the system property SpatialContextProvider to 
 com.googlecode.lucene.spatial.base.context.JtsSpatialContext

 ~ David

 On Jul 20, 2011, at 2:02 PM, Jamie Johnson wrote:

 So I've pulled the latest and can run the example, I've tried to move
 my config over and am having a bit of an issue when executing queries,
 specifically I get this:

 Unable to read: POLYGON((...

 looking at the code it's usign the simple spatial context, how do I
 specify JtsSpatialContext?

 On Wed, Jul 20, 2011 at 12:13 PM, Jamie Johnson jej2...@gmail.com wrote:
 Thanks for the update David, I'll give that a try now.

 On Wed, Jul 20, 2011 at 10:58 AM, Smiley, David W. dsmi...@mitre.org 
 wrote:
 Ryan just updated LSP for Lucene/Solr trunk compatibility so you should 
 do a mvn clean install and you'll be back in business.

 On Jul 20, 2011, at 10:37 AM, Jamie Johnson wrote:

 Thanks for responding so quickly, I don't mind waiting a bit.  I'll
 hang out until the updates have been  made.  Thanks again.

 On Tue, Jul 19, 2011 at 3:51 PM, Smiley, David W. dsmi...@mitre.org 
 wrote:
 Hi Jamie.
 I work on LSP; it can index polygons and query for them. Although the 
 capability is there, we have more testing  benchmarking to do, and 
 then we need to put together a tutorial to explain how to use it at 
 the Solr layer.  I recently cleaned up the READMEs a bit.  Try 
 downloading the trunk codebase, and follow the README.  It points to 
 another README which shows off a demo webapp.  At the conclusion of 
 this, you'll need to examine the tests and webapp a bit to figure out 
 how to apply it in your app.  We don't yet have a tutorial as the 
 framework has been in flux  although it has stabilized a good deal.

 Oh... by the way, this works off of Lucene/Solr trunk.  Within the 
 past week there was a major change to trunk and LSP won't compile 
 until we make updates.  Either Ryan McKinley or I will get to that by 
 the end of the week.  So unless you have access to 2-week old maven 
 artifacts of Lucene/Solr, you're stuck right now.

 ~ David Smiley
 Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/

 On Jul 19, 2011, at 3:03 PM, Jamie Johnson wrote:

 I have looked at the code being shared on the
 lucene-spatial-playground and was wondering if anyone could provide
 some details as to its state.  Specifically I'm looking to add
 geospatial support to my application based on a user provided polygon,
 is this currently possible using this extension?
















Updating fields in an existing document

2011-07-20 Thread Benson Margulies
We find ourselves in the following quandry:

At initial index time, we store a value in a field, and we use it for
facetting. So it, seemingly, has to be there as a field.

However, from time to time, something happens that causes us to want
to change this value. As far as we know, this requires us to
completely re-index the document, which is slow.

It struck me that we can't be the only people to go down this road, so
I write to inquire if we are missing something.


Re: Question on the appropriate software

2011-07-20 Thread Erick Erickson
Solr would work find for this, your PDF files would have to be interpreted
by Tika, but see Data Import handler, FileListEntityProcessor and
TikaEntityProcessor. I don't quite think Nutch is the tool here.

You'll be wanting to do highlighting and a couple of other things

You'll spend some time tweaking results to be what you want, but this
is certainly do-able.

Best
Erick

On Tue, Jul 19, 2011 at 1:29 PM, Matthew Twomey mtwo...@beakstar.com wrote:
 Greetings,

 I'm interesting in having a server based personal document library with a
 few specific features and I'm trying to determine what the most appropriate
 tools are to build it.

 I have the following content which I wish to include in the archive:

 1. A smallish collection of technical books in PDF format (around 100)
 2. Many years of several different magazine subscriptions in PDF format
 (probably another 100 - 200 PDFs)
 3. Several years of personal documents which were scanned in and converted
 to searchable PDF format (300 - 500 documents)
 4. I also have local mirrors of several HTML based reference sites

 I'd like to have the ability to index all of this content and search it from
 a web form (so that I and a few other can reach it from multiple locations).
 Here are two examples of the functionality I'm looking for:

 Scenario 1. What was that software that has all the nutritional data and
 hooks up to some USDA database? I know I read about it in one of my Linux
 Journals last year.

 Now I'd like to be able to pull up the webform and search for nutrition
 USDA. I'd like to restrict the search to the Linux Journal magazine PDFs
 (or refine the results). I'd like results to contain context snippets with
 each search result. Finally most importantly, I'd like multiple results per
 PDF (or all occurrences). The last one is important so that I can actually
 quickly find the right issue (in case there is some advertisement in every
 issue for the last year that contains those terms). When I click on the
 desired result, the PDF is downloaded by my browser.

 Scenario 2. How much have I been paying for property taxes for the last
 five years again? (the bills are all scanned in)

 In this case I'd like to search for my property identification number (which
 is on the bills) and the results should show all the documents that have it,
 with context. Clicking on results downloads the documents. I assume this
 example is simple to achieve if example 1 can be done.

 So in general, my question is - can this be done in a fairly straight
 forward manner with Solr? Is there a more appropriate tool to be using (e.g.
 Nutch?). Also, I have looked high and low for a free, already baked solution
 which can do scenario 1 but haven't been able to find something - so if
 someone knows of such a thing, please let me know.

 Thanks!

 -Matt



RE: Updating fields in an existing document

2011-07-20 Thread Jonathan Rochkind
Nope, you're not missing anything, there's no way to alter a document in an 
index but reindexing the whole document. Solr's architecture would make it 
difficult (although never say impossible) to do otherwise. But you're right it 
would be convenient for people other than you. 

Reindexing a single document ought not to be slow, although if you have many of 
them at once it could be, or if you end up needing to very frequently commit to 
an index it can indeed cause problems. 

From: Benson Margulies [bimargul...@gmail.com]
Sent: Wednesday, July 20, 2011 6:05 PM
To: solr-user
Subject: Updating fields in an existing document

We find ourselves in the following quandry:

At initial index time, we store a value in a field, and we use it for
facetting. So it, seemingly, has to be there as a field.

However, from time to time, something happens that causes us to want
to change this value. As far as we know, this requires us to
completely re-index the document, which is slow.

It struck me that we can't be the only people to go down this road, so
I write to inquire if we are missing something.


RE: Solr 3.3: Exception in thread Lucene Merge Thread #1

2011-07-20 Thread mdz-munich
Yeah, indeed.

But since the VM is equipped with plenty of RAM (22GB) and it works so far
(Solr 3.2) very well with this setup, I AM slightly confused, am I?

Maybe we should LOWER the dedicated Physical Memory? The remaining 10GB are
used for a second tomcat (8GB) and the OS (Suse). As far as I understand NIO
(mostly un-far), this package can directly use the most efficient
operations of the underlying platform. 






 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-3-3-Exception-in-thread-Lucene-Merge-Thread-1-tp3185248p3186986.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr not returning results for some key words

2011-07-20 Thread Matthew Twomey

Greetings,

I'm having trouble getting Solr to return results for key words that I 
know for sure are in the index. As a test, I've indexed a PDF of a book 
on Java. I'm trying to search the index for 
UnsupportedOperationException but I get no results. I can see it in 
the index though:


#
[root@myhost apache-solr-1.4.1]# strings 
example/solr/data/index/_0.fdt|grep UnsupportedOperationException

UnsupportedOperationException if the iterator returned by this collec-
throw new UnsupportedOperationException();
UnsupportedOperationException Object does not support methodCHAPTER 
9 EXCEPTIONS

UnsupportedOperationException, 87,
[root@myhost apache-solr-1.4.1]#
#

On the other hand, if I search the index for the word support (which 
is also contained in the grep above), I get a hit on this document. 
Furthermore, if I search on support and include highlighted snippets, 
I can see the word UnsupportedOperationException right in there in the 
highlight results!


#
of an object has
been detected where it is prohibited
UnsupportedOperationException Object does not emsupport/em
#

So why do I get no hits when I search for it?

This happens with many different key words. Any thoughts on how I can 
trouble shoot this or ideas on why it's not working properly?


Thanks,

-Matt


Re: Manipulating a Fuzzy Query's Prefix Length

2011-07-20 Thread Kyle Lee
Update:

Solr/Lucene 4.0 will incorporate a new fuzzy search algorithm with
substantial performance improvements.

To tide us over until this release, we've simply rebuilt from source with a
default prefix length of 2, which will suit our needs until then.

On Wed, Jul 20, 2011 at 10:09 AM, Kyle Lee randall.kyle@gmail.comwrote:

 We're performing fuzzy searches on a field possessing a large number of
 unique terms. Specifying a required minimum similarity of 0.7 results in a
 query execution time of 13-15 seconds, which stands in stark contrast to our
 average query time of 40ms.

 We suspect that the performance problem most likely emanates from the
 enumeration over all the unique terms in the index. The Lucene documentation
 for FuzzyQuery supports this theory with the following warning:

 *Warning:* this query is not very scalable with its default prefix length
 of 0 - in this case, *every* term will be enumerated and cause an edit score
 calculation.

 We would therefore like to set the prefix length to one or two, mandating
 that the first couple of characters match and thereby substantially reduce
 the number of terms enumerated. Is this possible with Solr? I haven't yet
 discovered a method, if so. Any help would be greatly appreciated.



Announcement/Invitation: Melbourne Solr/Lucene Users Group

2011-07-20 Thread Tal Rotbart
Hi all,

I hope you won't mind me informing the list, but I thought some
Melbourne-based members would find this relevant.

We have noticed that there is a blossoming of Apache Solr/Lucene usage
 development in Melbourne in addition to a lack of an unofficial,
relaxed gathering to allow some fruitful information and experience
exchange.

We're trying to put together a laid back meet up for developers (and
other interested people) who are currently using Apache Solr (and/or
Lucene) or would like to learn more about it.  Aiming for it to be a
high signal/noise ratio group, with meet ups probably once every two
months.

The first meet up is still TBD, but please join the group if you're
keen to join us for pizza, beer, and a discussion about Solr once we
figure out the date of the first meeting.

Also, please feel free to suggest quick (15 minute) presentations -
whether it be a problem you've solved, a problem you need help solving
or a general interesting experience of using Solr.

We're keeping registrations here: http://www.meetup.com/melbourne-solr/

Feel free to pass to co-workers, colleagues who would be interested.

Cheers,
Tal


Re: Announcement/Invitation: Melbourne Solr/Lucene Users Group

2011-07-20 Thread Dave Hall

Hi Tal,

On 21/07/11 14:04, Tal Rotbart wrote:

We have noticed that there is a blossoming of Apache Solr/Lucene usage
  development in Melbourne in addition to a lack of an unofficial,
relaxed gathering to allow some fruitful information and experience
exchange.

We're trying to put together a laid back meet up for developers (and
other interested people) who are currently using Apache Solr (and/or
Lucene) or would like to learn more about it.  Aiming for it to be a
high signal/noise ratio group, with meet ups probably once every two
months.


This sounds great!  I'm not sure I'll be a regular, but if I'm around 
town when it is on I will try to drop in.



The first meet up is still TBD, but please join the group if you're
keen to join us for pizza, beer, and a discussion about Solr once we
figure out the date of the first meeting.
Once a date is decided please update the Melbourne *UG wiki page so 
others can find out about it.  The wiki has meeting times for various 
user groups around town, which might help you find a time which doesn't 
clash with other groups.  Check out at http://perl.net.au/wiki/Melbourne


Cheers

Dave


Re: Solr not returning results for some key words

2011-07-20 Thread Matthew Twomey
Ok, apparently I'm not the first to have fallen prey to maxFieldLength 
gotcha:


http://lucene.472066.n3.nabble.com/Solr-ignoring-maxFieldLength-td473263.html

All fixed now.

-Matt

On 07/20/2011 07:13 PM, Matthew Twomey wrote:

Greetings,

I'm having trouble getting Solr to return results for key words that I 
know for sure are in the index. As a test, I've indexed a PDF of a 
book on Java. I'm trying to search the index for 
UnsupportedOperationException but I get no results. I can see it 
in the index though:


#
[root@myhost apache-solr-1.4.1]# strings 
example/solr/data/index/_0.fdt|grep UnsupportedOperationException

UnsupportedOperationException if the iterator returned by this collec-
throw new UnsupportedOperationException();
UnsupportedOperationException Object does not support method
CHAPTER 9 EXCEPTIONS

UnsupportedOperationException, 87,
[root@myhost apache-solr-1.4.1]#
#

On the other hand, if I search the index for the word support (which 
is also contained in the grep above), I get a hit on this document. 
Furthermore, if I search on support and include highlighted 
snippets, I can see the word UnsupportedOperationException right in 
there in the highlight results!


#
of an object has
been detected where it is prohibited
UnsupportedOperationException Object does not emsupport/em
#

So why do I get no hits when I search for it?

This happens with many different key words. Any thoughts on how I can 
trouble shoot this or ideas on why it's not working properly?


Thanks,

-Matt




Re: Question on the appropriate software

2011-07-20 Thread Matthew Twomey
Excellent, thanks for the confirmation Erik. I've started working with 
Solr (just getting my feet wet at this point).


-Matt

On 07/20/2011 05:38 PM, Erick Erickson wrote:

Solr would work find for this, your PDF files would have to be interpreted
by Tika, but see Data Import handler, FileListEntityProcessor and
TikaEntityProcessor. I don't quite think Nutch is the tool here.

You'll be wanting to do highlighting and a couple of other things

You'll spend some time tweaking results to be what you want, but this
is certainly do-able.

Best
Erick

On Tue, Jul 19, 2011 at 1:29 PM, Matthew Twomeymtwo...@beakstar.com  wrote:

Greetings,

I'm interesting in having a server based personal document library with a
few specific features and I'm trying to determine what the most appropriate
tools are to build it.

I have the following content which I wish to include in the archive:

1. A smallish collection of technical books in PDF format (around 100)
2. Many years of several different magazine subscriptions in PDF format
(probably another 100 - 200 PDFs)
3. Several years of personal documents which were scanned in and converted
to searchable PDF format (300 - 500 documents)
4. I also have local mirrors of several HTML based reference sites

I'd like to have the ability to index all of this content and search it from
a web form (so that I and a few other can reach it from multiple locations).
Here are two examples of the functionality I'm looking for:

Scenario 1. What was that software that has all the nutritional data and
hooks up to some USDA database? I know I read about it in one of my Linux
Journals last year.

Now I'd like to be able to pull up the webform and search for nutrition
USDA. I'd like to restrict the search to the Linux Journal magazine PDFs
(or refine the results). I'd like results to contain context snippets with
each search result. Finally most importantly, I'd like multiple results per
PDF (or all occurrences). The last one is important so that I can actually
quickly find the right issue (in case there is some advertisement in every
issue for the last year that contains those terms). When I click on the
desired result, the PDF is downloaded by my browser.

Scenario 2. How much have I been paying for property taxes for the last
five years again? (the bills are all scanned in)

In this case I'd like to search for my property identification number (which
is on the bills) and the results should show all the documents that have it,
with context. Clicking on results downloads the documents. I assume this
example is simple to achieve if example 1 can be done.

So in general, my question is - can this be done in a fairly straight
forward manner with Solr? Is there a more appropriate tool to be using (e.g.
Nutch?). Also, I have looked high and low for a free, already baked solution
which can do scenario 1 but haven't been able to find something - so if
someone knows of such a thing, please let me know.

Thanks!

-Matt





Re: Announcement/Invitation: Melbourne Solr/Lucene Users Group

2011-07-20 Thread Mark Mandel
Sounds great :) I'll sign up as well.

Look forward to a meeting!

Mark

On Thu, Jul 21, 2011 at 2:14 PM, Dave Hall dave.h...@skwashd.com wrote:

 Hi Tal,


 On 21/07/11 14:04, Tal Rotbart wrote:

 We have noticed that there is a blossoming of Apache Solr/Lucene usage
   development in Melbourne in addition to a lack of an unofficial,
 relaxed gathering to allow some fruitful information and experience
 exchange.

 We're trying to put together a laid back meet up for developers (and
 other interested people) who are currently using Apache Solr (and/or
 Lucene) or would like to learn more about it.  Aiming for it to be a
 high signal/noise ratio group, with meet ups probably once every two
 months.


 This sounds great!  I'm not sure I'll be a regular, but if I'm around town
 when it is on I will try to drop in.


  The first meet up is still TBD, but please join the group if you're
 keen to join us for pizza, beer, and a discussion about Solr once we
 figure out the date of the first meeting.

 Once a date is decided please update the Melbourne *UG wiki page so others
 can find out about it.  The wiki has meeting times for various user groups
 around town, which might help you find a time which doesn't clash with other
 groups.  Check out at 
 http://perl.net.au/wiki/**Melbournehttp://perl.net.au/wiki/Melbourne

 Cheers

 Dave




-- 
E: mark.man...@gmail.com
T: http://www.twitter.com/neurotic
W: www.compoundtheory.com

cf.Objective(ANZ) + Flex - Nov 17, 18 - Melbourne Australia
http://www.cfobjective.com.au


Re: Announcement/Invitation: Melbourne Solr/Lucene Users Group

2011-07-20 Thread Ranveer Kumar
Hi,

I m intrested to atained but not in aus.:-(

Regards
 On 21-Jul-2011 9:45 AM, Dave Hall dave.h...@skwashd.com wrote:
 Hi Tal,

 On 21/07/11 14:04, Tal Rotbart wrote:
 We have noticed that there is a blossoming of Apache Solr/Lucene usage
  development in Melbourne in addition to a lack of an unofficial,
 relaxed gathering to allow some fruitful information and experience
 exchange.

 We're trying to put together a laid back meet up for developers (and
 other interested people) who are currently using Apache Solr (and/or
 Lucene) or would like to learn more about it. Aiming for it to be a
 high signal/noise ratio group, with meet ups probably once every two
 months.

 This sounds great! I'm not sure I'll be a regular, but if I'm around
 town when it is on I will try to drop in.

 The first meet up is still TBD, but please join the group if you're
 keen to join us for pizza, beer, and a discussion about Solr once we
 figure out the date of the first meeting.
 Once a date is decided please update the Melbourne *UG wiki page so
 others can find out about it. The wiki has meeting times for various
 user groups around town, which might help you find a time which doesn't
 clash with other groups. Check out at http://perl.net.au/wiki/Melbourne

 Cheers

 Dave