Re: Missing slf4j jar in solr 1.4.0 distribution?

2009-11-18 Thread Per Halvor Tryggeseth
Thanks. I see. It seems that slf4j-nop-1.5.5.jar is the only jar file missing 
in solrj-lib, so I suggest that it should be included in the next release.

Per Halvor





-Opprinnelig melding-
Fra: Chris Hostetter [mailto:hossman_luc...@fucit.org]
Sendt: 17. november 2009 20:51
Til: 'solr-user@lucene.apache.org'
Emne: Re: Missing slf4j jar in solr 1.4.0 distribution?


: I downloaded solr 1.4.0 but discovered when using solrj 1.4 that a
: required slf4j jar was missing in the distribution (i.e.
: apache-solr-1.4.0/dist). I got a java.lang.NoClassDefFoundError:
: org/slf4j/impl/StaticLoggerBinder when using solrj
...
: Have I overlooked something or are not all necessary classes required
: for using solrj in solr 1.4.0 included in the distribution?

Regretably, Solr releases aren't particularly consistent about where 
third-party libraries can be found.

If you use the the pre-built war, the 'main' dependencies are allready bunlded 
into it.  If you want to roll your own, you need to look at the ./lib 
directory -- ./dist is only *suppose* to contain the artifacts built from 
solr source But that solrj-lib directory can be confusing)...

hoss...@brunner:apache-solr-1.4.0$ ls ./lib/slf4j-*
lib/slf4j-api-1.5.5.jar lib/slf4j-jdk14-1.5.5.jar

-Hoss



Ruby serialization with dismax

2009-11-18 Thread Andrea Campi

Hi,

not sure this is something new in Solr 1.4, but I just noticed that 
facets results are serialized differently with standard and dismax when 
using wt=ruby.


Standard returns:

'my_facet'={'20344'=1}

Whereas dismax has:

'my_facet'=['20344',1]

Admittedly this is not a big deal, it's easy to work around, but it 
still feels strange.

Am I missing anything or is it a bug? In that case I'll file an issue.

Bye,
Andrea


HTMLStripCharFilterFactory does not replace #233;

2009-11-18 Thread Kundig, Andreas
Hello

I indexed an html document with a decimal HTML Entity encodings: the character 
é (e with an acute accent) is encoded as #233; The exact content of the 
document is:

htmlbody#231;a va m#233;m#233; ?/body/html

A search for 'mémé' returns no document. If I put the line above in solr 
admin's analysis.jsp it also doesn't match mémé. There is only a match if I 
replace #233; by é .

This is how I configured the fieldType:

fieldType name=text_fr class=solr.TextField positionIncrementGap=100
  analyzer
charFilter class=solr.HTMLStripCharFilterFactory/
tokenizer class=solr.WhitespaceTokenizerFactory/
  /analyzer
/fieldType

I tried avoiding the problem by using the MappingCharFilterFactory:

fieldType name=text_fr class=solr.TextField positionIncrementGap=100
  analyzer
charFilter class=solr.MappingCharFilterFactory mapping=mapping.txt/
charFilter class=solr.HTMLStripCharFilterFactory/
tokenizer class=solr.WhitespaceTokenizerFactory/
  /analyzer
/fieldType

I put the file mapping.txt in the conf directory. It contains just this:

#233; = é

This doesn't work either. How can I get this to work?
(I am using solr 1.4.0)

thank you
Andréas Kündig

World Intellectual Property Organization Disclaimer:

This electronic message may contain privileged, confidential and
copyright protected information. If you have received this e-mail
by mistake, please immediately notify the sender and delete this
e-mail and all its attachments. Please ensure all e-mail attachments
are scanned for viruses prior to opening or using.


Index-time field boosting not working?

2009-11-18 Thread Ian Smith
I have the following field configured in schema.xml:

field name=title type=text indexed=true stored=true
omitNorms=false boost=3.0 /

Where text is the type which came with the Solr distribution.  I have
not been able to get this configuration to alter any document scores,
and if I look at the indexes in Luke there is no change in the norms
(compared to an un-boosted equivalent).

I have confirmed that document boosting works (via SolrJ), but our field
boosting needs to be done in the schema.

Am I doing something wrong (BTW I have tried using 3.0f as well, no
difference)?

Also, I have seen no debug output during startup which would indicate
that fild boosting is being configured - should there be any?

I have found no usage examples of this in the Solr 1.4 book, except a
vague discouragement - is this a deprecated feature?

TIA,

Ian

Web design and intelligent Content Management. www.twitter.com/gossinteractive 

Registered Office: c/o Bishop Fleming, Cobourg House, Mayflower Street, 
Plymouth, PL1 1LG.  Company Registration No: 3553908 

This email contains proprietary information, some or all of which may be 
legally privileged. It is for the intended recipient only. If an addressing or 
transmission error has misdirected this email, please notify the author by 
replying to this email. If you are not the intended recipient you may not use, 
disclose, distribute, copy, print or rely on this email. 

Email transmission cannot be guaranteed to be secure or error free, as 
information may be intercepted, corrupted, lost, destroyed, arrive late or 
incomplete or contain viruses. This email and any files attached to it have 
been checked with virus detection software before transmission. You should 
nonetheless carry out your own virus check before opening any attachment. GOSS 
Interactive Ltd accepts no liability for any loss or damage that may be caused 
by software viruses.




Re: Ruby serialization with dismax

2009-11-18 Thread Erik Hatcher

Andrea,

I'd guess you have json.nl=arrarr set for your dismax handler (or  
request).


Erik

On Nov 18, 2009, at 12:01 PM, Andrea Campi wrote:


Hi,

not sure this is something new in Solr 1.4, but I just noticed that  
facets results are serialized differently with standard and dismax  
when using wt=ruby.


Standard returns:

'my_facet'={'20344'=1}

Whereas dismax has:

'my_facet'=['20344',1]

Admittedly this is not a big deal, it's easy to work around, but it  
still feels strange.

Am I missing anything or is it a bug? In that case I'll file an issue.

Bye,
   Andrea




Re: Ruby serialization with dismax

2009-11-18 Thread Andrea Campi

Erik,

Erik Hatcher wrote:

Andrea,

I'd guess you have json.nl=arrarr set for your dismax handler (or 
request).

sigh, you're right, sorry for the noise :/

Andrea


VelocityResponseWriter/Solritas character encoding issue

2009-11-18 Thread Sascha Szott

Hi,

I've played around with Solr's VelocityResponseWriter (which is indeed a 
very useful feature for rapid prototyping). I've realized that Velocity 
uses ISO-8859-1 as default character encoding. I've changed this setting 
to UTF-8 in my velocity.properties file (inside the conf directory), i.e.,


   input.encoding=UTF-8
   output.encoding=UTF-8

and checked that the settings were successfully loaded.

Within the main Velocity template, browse.vm, the character encoding is 
set to UTF-8 as well, i.e.,


   meta http-equiv=content-type content=text/html; charset=UTF-8/

After starting Solr (which is deployed in a Tomcat 6 server on a Ubuntu 
machine), I ran into some character encoding problems.


Due to the change of input.encoding to UTF-8, no problems occur when 
non-ASCII characters are presend in the query string, e.g. german 
umlauts. But unfortunately, something is wrong with the encoding of 
characters in the html page that is generated by VelocityResponseWriter. 
The non-ASCII characters aren't displayed properly (for example, FF 
prints a black diamond with a white question mark). If I manually set 
the encoding to ISO-8859-1, the non-ASCII characters are displayed 
correctly. Does anybody have a clue?


Thanks in advance,
Sascha









Re: VelocityResponseWriter/Solritas character encoding issue

2009-11-18 Thread Sascha Szott

Hi Erik,

Erik Hatcher wrote:
Can you give me a test document that causes an issue?  (maybe send me a 
Solr XML document in private e-mail).   I'll see what I can do once I 
can see the issue first hand.
Thank you! Just try the utf8-example.xml file in the exampledoc 
directory. After having indexed the document, the output of the script 
test_utf8.sh suggests to me that everything works correctly:


 Solr server is up.
 HTTP GET is accepting UTF-8
 HTTP POST is accepting UTF-8
 HTTP POST does not default to UTF-8
 HTTP GET is accepting UTF-8 beyond the basic multilingual plane
 HTTP POST is accepting UTF-8 beyond the basic multilingual plane
 HTTP POST + URL params is accepting UTF-8 beyond the basic multilingual

If I'm using the standard QueryResponseWriter and the query q=umlauts, 
the responding xml page contains properly printed non-ASCII characters. 
The same query against the VelocityResponseWriter returns a lot of 
Unicode replacement characters (u+FFFD) instead.


-Sascha



On Nov 18, 2009, at 2:48 PM, Sascha Szott wrote:


Hi,

I've played around with Solr's VelocityResponseWriter (which is indeed 
a very useful feature for rapid prototyping). I've realized that 
Velocity uses ISO-8859-1 as default character encoding. I've changed 
this setting to UTF-8 in my velocity.properties file (inside the conf 
directory), i.e.,


  input.encoding=UTF-8
  output.encoding=UTF-8

and checked that the settings were successfully loaded.

Within the main Velocity template, browse.vm, the character encoding 
is set to UTF-8 as well, i.e.,


  meta http-equiv=content-type content=text/html; charset=UTF-8/

After starting Solr (which is deployed in a Tomcat 6 server on a 
Ubuntu machine), I ran into some character encoding problems.


Due to the change of input.encoding to UTF-8, no problems occur when 
non-ASCII characters are presend in the query string, e.g. german 
umlauts. But unfortunately, something is wrong with the encoding of 
characters in the html page that is generated by 
VelocityResponseWriter. The non-ASCII characters aren't displayed 
properly (for example, FF prints a black diamond with a white question 
mark). If I manually set the encoding to ISO-8859-1, the non-ASCII 
characters are displayed correctly. Does anybody have a clue?


Thanks in advance,
Sascha











Re: Missing slf4j jar in solr 1.4.0 distribution?

2009-11-18 Thread Ryan McKinley
Solr includes slf4j-jdk14-1.5.5.jar, if you want to use the nop (or  
log4j, or loopback) impl you will need to include that in your own  
project.


Solr uses slf4j so that each user can decide their logging  
implementation, it includes the jdk version so that something works  
off-the-shelf, but if you want more control, then you can switch in  
whatever you want.


ryan


On Nov 18, 2009, at 1:22 AM, Per Halvor Tryggeseth wrote:

Thanks. I see. It seems that slf4j-nop-1.5.5.jar is the only jar  
file missing in solrj-lib, so I suggest that it should be included  
in the next release.


Per Halvor





-Opprinnelig melding-
Fra: Chris Hostetter [mailto:hossman_luc...@fucit.org]
Sendt: 17. november 2009 20:51
Til: 'solr-user@lucene.apache.org'
Emne: Re: Missing slf4j jar in solr 1.4.0 distribution?


: I downloaded solr 1.4.0 but discovered when using solrj 1.4 that a
: required slf4j jar was missing in the distribution (i.e.
: apache-solr-1.4.0/dist). I got a java.lang.NoClassDefFoundError:
: org/slf4j/impl/StaticLoggerBinder when using solrj
   ...
: Have I overlooked something or are not all necessary classes  
required

: for using solrj in solr 1.4.0 included in the distribution?

Regretably, Solr releases aren't particularly consistent about where  
third-party libraries can be found.


If you use the the pre-built war, the 'main' dependencies are  
allready bunlded into it.  If you want to roll your own, you need to  
look at the ./lib directory -- ./dist is only *suppose* to  
contain the artifacts built from solr source But that solrj-lib  
directory can be confusing)...


hoss...@brunner:apache-solr-1.4.0$ ls ./lib/slf4j-*
lib/slf4j-api-1.5.5.jar lib/slf4j-jdk14-1.5.5.jar

-Hoss





Re: initiate reindexing in solr for field type changes

2009-11-18 Thread darniz

Thanks
So going by you reply, can i assume that if there is a configuration change
to my schema I have to again index documents,
There is no short cut of updating the index.
Because we  cant afford to index 2 million documents again and again.
There should be some utility or command line which does this things in the
background. 

i hope i make sense.

darniz
-- 
View this message in context: 
http://old.nabble.com/initiate-reindexing-in-solr-for-field-type-changes-tp26397067p26413172.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: UTF-8 Character Set not specifed on OutputStreamWriter in StreamingUpdateSolrServer

2009-11-18 Thread Joe Kessel

Specifying the file.encoding did work, although I don't think it is a suitable 
workaround for my use case.  Any idea what my next step is to having a bug 
opened.

 

Thanks,

Joe
 
 Date: Wed, 18 Nov 2009 16:15:55 +0530
 Subject: Re: UTF-8 Character Set not specifed on OutputStreamWriter in 
 StreamingUpdateSolrServer
 From: shalinman...@gmail.com
 To: solr-user@lucene.apache.org
 
 On Wed, Nov 18, 2009 at 6:56 AM, Joe Kessel isjust...@hotmail.com wrote:
 
 
  While trying to make use of the StreamingUpdateSolrServer for updates with
  the release code for Solr.14 I noticed some characters such as é did not
  show up in the index correctly. The code should set the CharsetName via the
  constructor of the OutputStreamWriter. I noticed that the
  CommonsHttpSolrServer seems to set the charset to UTF-8. As a workaround I
  am able to use the CommonsHttpSolrServer. Being new to Solr, not sure what
  the bug protocol is, assuming this is a bug.
 
 
 I wrote a simple test case and I'm able to index and query 'é' and other
 characters using StreamingUpdateSolrServer. Can you use -Dfile.encoding=UTF8
 as a JVM parameter and see if that fixes your case. If it does, then it may
 be a Solr bug.
 
 -- 
 Regards,
 Shalin Shekhar Mangar.
  
_
Hotmail: Trusted email with powerful SPAM protection.
http://clk.atdmt.com/GBL/go/177141665/direct/01/

Re: VelocityResponseWriter/Solritas character encoding issue

2009-11-18 Thread Lance Norskog
What platform are you using? Windows does not use UTF-8 by default,
and this can cause subtle problems. If you can do the same thing on
other platforms (Linux, Mac) that would help narrow down the problem.

On Wed, Nov 18, 2009 at 8:15 AM, Sascha Szott sz...@zib.de wrote:
 Hi Erik,

 Erik Hatcher wrote:

 Can you give me a test document that causes an issue?  (maybe send me a
 Solr XML document in private e-mail).   I'll see what I can do once I can
 see the issue first hand.

 Thank you! Just try the utf8-example.xml file in the exampledoc directory.
 After having indexed the document, the output of the script test_utf8.sh
 suggests to me that everything works correctly:

  Solr server is up.
  HTTP GET is accepting UTF-8
  HTTP POST is accepting UTF-8
  HTTP POST does not default to UTF-8
  HTTP GET is accepting UTF-8 beyond the basic multilingual plane
  HTTP POST is accepting UTF-8 beyond the basic multilingual plane
  HTTP POST + URL params is accepting UTF-8 beyond the basic multilingual

 If I'm using the standard QueryResponseWriter and the query q=umlauts, the
 responding xml page contains properly printed non-ASCII characters. The same
 query against the VelocityResponseWriter returns a lot of Unicode
 replacement characters (u+FFFD) instead.

 -Sascha


 On Nov 18, 2009, at 2:48 PM, Sascha Szott wrote:

 Hi,

 I've played around with Solr's VelocityResponseWriter (which is indeed a
 very useful feature for rapid prototyping). I've realized that Velocity uses
 ISO-8859-1 as default character encoding. I've changed this setting to UTF-8
 in my velocity.properties file (inside the conf directory), i.e.,

  input.encoding=UTF-8
  output.encoding=UTF-8

 and checked that the settings were successfully loaded.

 Within the main Velocity template, browse.vm, the character encoding is
 set to UTF-8 as well, i.e.,

  meta http-equiv=content-type content=text/html; charset=UTF-8/

 After starting Solr (which is deployed in a Tomcat 6 server on a Ubuntu
 machine), I ran into some character encoding problems.

 Due to the change of input.encoding to UTF-8, no problems occur when
 non-ASCII characters are presend in the query string, e.g. german umlauts.
 But unfortunately, something is wrong with the encoding of characters in the
 html page that is generated by VelocityResponseWriter. The non-ASCII
 characters aren't displayed properly (for example, FF prints a black diamond
 with a white question mark). If I manually set the encoding to ISO-8859-1,
 the non-ASCII characters are displayed correctly. Does anybody have a clue?

 Thanks in advance,
 Sascha












-- 
Lance Norskog
goks...@gmail.com


Re: Missing slf4j jar in solr 1.4.0 distribution?

2009-11-18 Thread Jason Rutherglen
Ah, thanks for the tip about switching out the jdk jar with the
log4j jar. I think I was running into this issue and couldn't
figure out why Solr logging couldn't be configured when running
inside Hadoop which uses log4j, maybe this was the issue?

On Wed, Nov 18, 2009 at 9:11 AM, Ryan McKinley ryan...@gmail.com wrote:
 Solr includes slf4j-jdk14-1.5.5.jar, if you want to use the nop (or log4j,
 or loopback) impl you will need to include that in your own project.

 Solr uses slf4j so that each user can decide their logging implementation,
 it includes the jdk version so that something works off-the-shelf, but if
 you want more control, then you can switch in whatever you want.

 ryan


 On Nov 18, 2009, at 1:22 AM, Per Halvor Tryggeseth wrote:

 Thanks. I see. It seems that slf4j-nop-1.5.5.jar is the only jar file
 missing in solrj-lib, so I suggest that it should be included in the next
 release.

 Per Halvor





 -Opprinnelig melding-
 Fra: Chris Hostetter [mailto:hossman_luc...@fucit.org]
 Sendt: 17. november 2009 20:51
 Til: 'solr-user@lucene.apache.org'
 Emne: Re: Missing slf4j jar in solr 1.4.0 distribution?


 : I downloaded solr 1.4.0 but discovered when using solrj 1.4 that a
 : required slf4j jar was missing in the distribution (i.e.
 : apache-solr-1.4.0/dist). I got a java.lang.NoClassDefFoundError:
 : org/slf4j/impl/StaticLoggerBinder when using solrj
       ...
 : Have I overlooked something or are not all necessary classes required
 : for using solrj in solr 1.4.0 included in the distribution?

 Regretably, Solr releases aren't particularly consistent about where
 third-party libraries can be found.

 If you use the the pre-built war, the 'main' dependencies are allready
 bunlded into it.  If you want to roll your own, you need to look at the
 ./lib directory -- ./dist is only *suppose* to contain the artifacts
 built from solr source But that solrj-lib directory can be confusing)...

 hoss...@brunner:apache-solr-1.4.0$ ls ./lib/slf4j-*
 lib/slf4j-api-1.5.5.jar         lib/slf4j-jdk14-1.5.5.jar

 -Hoss





Re: initiate reindexing in solr for field type changes

2009-11-18 Thread Otis Gospodnetic
Darniz,

Yes, if there is an incompatible schema change, you need to reindex your 
documents.

Otis
P.S.
Please include the copy of the response when replying, so the 
context/background of your question is easy to figure out.
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 
 From: darniz rnizamud...@edmunds.com
 To: solr-user@lucene.apache.org
 Sent: Wed, November 18, 2009 1:30:06 PM
 Subject: Re: initiate reindexing in solr for field type changes
 
 
 Thanks
 So going by you reply, can i assume that if there is a configuration change
 to my schema I have to again index documents,
 There is no short cut of updating the index.
 Because we  cant afford to index 2 million documents again and again.
 There should be some utility or command line which does this things in the
 background. 
 
 i hope i make sense.
 
 darniz
 -- 
 View this message in context: 
 http://old.nabble.com/initiate-reindexing-in-solr-for-field-type-changes-tp26397067p26413172.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: HTMLStripCharFilterFactory does not replace #233;

2009-11-18 Thread Koji Sekiguchi

Your first definition of text_fr seems to be correct and should work
as expected. I tested it and worked fine (mémé was highlighted).

What was the output of HTMLStripCharFilterFactory in analysis.jsp?
In my analysis.jsp, I got ça va mémé ?.

Koji


Kundig, Andreas wrote:

Hello

I indexed an html document with a decimal HTML Entity encodings: the character é (e 
with an acute accent) is encoded as #233; The exact content of the document is:

htmlbody#231;a va m#233;m#233; ?/body/html

A search for 'mémé' returns no document. If I put the line above in solr admin's 
analysis.jsp it also doesn't match mémé. There is only a match if I replace 
#233; by é .

This is how I configured the fieldType:

fieldType name=text_fr class=solr.TextField positionIncrementGap=100
  analyzer
charFilter class=solr.HTMLStripCharFilterFactory/
tokenizer class=solr.WhitespaceTokenizerFactory/
  /analyzer
/fieldType

I tried avoiding the problem by using the MappingCharFilterFactory:

fieldType name=text_fr class=solr.TextField positionIncrementGap=100
  analyzer
charFilter class=solr.MappingCharFilterFactory mapping=mapping.txt/
charFilter class=solr.HTMLStripCharFilterFactory/
tokenizer class=solr.WhitespaceTokenizerFactory/
  /analyzer
/fieldType

I put the file mapping.txt in the conf directory. It contains just this:

#233; = é

This doesn't work either. How can I get this to work?
(I am using solr 1.4.0)

thank you
Andréas Kündig

World Intellectual Property Organization Disclaimer:

This electronic message may contain privileged, confidential and
copyright protected information. If you have received this e-mail
by mistake, please immediately notify the sender and delete this
e-mail and all its attachments. Please ensure all e-mail attachments
are scanned for viruses prior to opening or using.

  



--
http://www.rondhuit.com/en/



Re: Disable coord

2009-11-18 Thread Guangwei Yuan
Thanks for your reply.  Nested boolean queries is a valid concern.  I also
realized that isCoordDisabled needs to be considered in
BooleanQuery.hashCode so that a query with coord=false will have a different
cache key in Solr.

On Thu, Nov 12, 2009 at 12:12 PM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


 : I want to disable coord for certain queries.  For example, if I pass a
 URL
 : parameter like disableCoord to Solr, the BooleanQuery generated will
 have
 : coord disabled.  If it's not currently supported, what would be a good
 way
 : to implement it?

 in order to have something like this on a per query basis it needs to be
 handled by the query parsers.  the Lucene QueryParser doesn't provide any
 syntax markup to do this, so you would have to add your own -- you could
 subclass the LuceneQParserPlugin and just have it *always* ignore the cord
 if some query param coord=false was set, but you'd have to be careful
 about wether that's really what you want in a deeply nested set of boolean
 queries -- ie:   (A +B -C +(D E F G H) ((X Y Z) (L M (N O P ... what
 if you only want to disable the coord on the (X Y Z) boolean query?

 :
 : Thanks,
 : Guangwei
 :



 -Hoss