Re: Preserve XML hierarchy

2011-07-15 Thread Gora Mohanty
On Thu, Jul 14, 2011 at 8:43 PM, Lucas Miguez lucas.mig...@gmail.com wrote:
 Thanks for your help!

 DIH XPathEntityProcessor helps me to index the XML Files, but, does it
 help to me to know from where the node comes? Following the example in
 my previous post:

 example: Imagine that the user search the word zona, then I have to
 show the TitleP, the TextP, the TitlePart, the TextPart and all the
 TextSubPart that are childs of gSubPart.

 Well, I tried to create TextPart, TitlePart, etc with the XPath
 expression of the location in the original XML, using dynamic fields,
 for example:
 dynamic field=TextPart * multivalued=true indexed=true ... /

There should not be a space between TextPart and *

 to have the XPath associated with the field, but I don't know how to
 search in all TextPart * fields...
[...]

You can search in individual fields, e.g., with ?q=TitlePart:myterm.
For searching in all TextPart* fields, the easiest way probably is
to copy the fields into a full-text search field. With the default Solr
schema, this can be done by adding a directive like
   copyField source=TextPart*  dest=text /
This copies all fields into the field text, which is searched by
default. Thus, ?q=myterm will find myterm in all TextPart*
fields.

Regards,
Gora


Re: SolrJ Collapsable Query Fails

2011-07-15 Thread Kurt Sultana
Hi,

Thanks for the information. However, I still have one more problem. I am
iterating over the values of the NamedList. I have 2 values, one
being 'responseHeader' and the other one being 'grouped'. I would like to
access some information stored within the grouped section, which has
data structured like so:

grouped={attr_directory={matches=4,groups=[{groupValue=C:\Users\rvassallo\Desktop\Index,doclist={numFound=2,start=0,docs=[SolrDocument[{attr_meta=[Author,
kcook, Last-Modified, 2011-03-02T14:14:18Z...

With the 'get(group)' method I am only able to access the entire
'{attr_directory={matches=4,g...' section. Is there some functionality which
allows me to get other data? Something like this for instance:
'get(group.matches)' or maybe 'get(group.attr_directory.matches)' (which
will yield the value of 4), or do I need to process the String that the
'get(...)' returns to get what I need?

Thanks :)

On Thu, Jul 14, 2011 at 12:52 PM, Ahmet Arslan iori...@yahoo.com wrote:


 See the Yonik's reply : http://search-lucene.com/m/tCmky1v94D92/

 In short you need to use NamedListObject getResponse().

  I am currently trying to run a
  collapsable query using SolrJ using SolR 3.3.
  The problem is that when I run the query through the web
  interface, with
  this url:
 
 
 http://localhost:8080/solr/select/?q=attr_content%3Alynxsort=attr_location+descgroup=truegroup.field=attr_directory
 
  I am able to see the XML which is returned. The problem
  though, is that when
  I try to run the same query through SolrJ, using this
  code:
 
  SolrQuery queryString = new
  SolrQuery();
 
  for (String param :
  query.keySet())
 
  {
  if
  (param.equals(fq))
 
  {
 
  queryString.addFilterQuery(query.get(param));
 
  }
  else
  {
 
  queryString.setParam(param, query.get(param));
 
  }
  }
 
 
  System.out.println(queryString.toString());
 
  QueryResponse response =
  server.query(queryString);
  //Exception takes place at this line
 
  SolrDocumentList docList =
  response.getResults();
 
  Which constructs a URL like so:
 
 
 q=attr_content%3Alynxsort=attr_location+descgroup=truegroup.field=attr_directory
 
  This throws an exception:
 
  Caused by: org.apache.solr.common.SolrException: parsing
  error at
 
 org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:145)
  at
 
 org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:106)
  at
 
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:477)
  at
 
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
  at
 
 org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89)
  ... 3 more Caused by: javax.xml.stream.XMLStreamException:
  ParseError at
  [row,col]:[3,30088] Message: error reading value:LST at
 
 org.apache.solr.client.solrj.impl.XMLResponseParser.readArray(XMLResponseParser.java:324)
  at
 
 org.apache.solr.client.solrj.impl.XMLResponseParser.readNamedList(XMLResponseParser.java:245)
  at
 
 org.apache.solr.client.solrj.impl.XMLResponseParser.readNamedList(XMLResponseParser.java:244)
  at
 
 org.apache.solr.client.solrj.impl.XMLResponseParser.readNamedList(XMLResponseParser.java:244)
  at
 
 org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:130)
 
  I have tried it with both Jetty and Tomcat, the error is
  the same for both.
  I have managed to get other queries to run (with both
  servers), so I presume
  that the problem lies with this particular type of query.
 
  Any insight on this problem will be highly appreciated,
 
  Thanks :)
 



POST VS GET and NON English Characters

2011-07-15 Thread Sujatha Arun
Hello,

We have implemented solr search in  several languages .Intially we used the
GET method for querying ,but later moved to  POST method to accomodate
lengthy queries .

When we moved form  GET TO POSt method ,the german characteres could no
longer be searched and I had to use the fucntion utf8_decode in my
application  for the search to work for german characters.

Currently I am doing this  while quering using the POST method ,we are using
the standard Request Handler


$this-_queryterm=iconv(UTF-8, ISO-8859-1//TRANSLIT//IGNORE,
$this-_queryterm);


This makes the query work for german characters and other languages but does
not work for certain charactes  in Lithuvanian and spanish.Example:
*Not working

   - *Iš
   - Estremadūros
   - sNaująjį
   - MEDŽIAGOTYRA
   - MEDŽIAGOS
   - taškuose

*Working

   - *garbę
   - ieškoti
   - ispanų

Any ideas /input  ?

Regards
Sujatha


Re: POST VS GET and NON English Characters

2011-07-15 Thread pankaj bhatt
Hi Arun,
  This looks like an Encoding issue to me.
   Can you change your browser settinsg to UTF-8 and hit the search url
via GET method.

We faced the similar problem with chienese,korean languages, this
solved the problem.

/ Pankaj Bhatt.

2011/7/15 Sujatha Arun suja.a...@gmail.com

 Hello,

 We have implemented solr search in  several languages .Intially we used the
 GET method for querying ,but later moved to  POST method to accomodate
 lengthy queries .

 When we moved form  GET TO POSt method ,the german characteres could no
 longer be searched and I had to use the fucntion utf8_decode in my
 application  for the search to work for german characters.

 Currently I am doing this  while quering using the POST method ,we are
 using
 the standard Request Handler


 $this-_queryterm=iconv(UTF-8, ISO-8859-1//TRANSLIT//IGNORE,
 $this-_queryterm);


 This makes the query work for german characters and other languages but
 does
 not work for certain charactes  in Lithuvanian and spanish.Example:
 *Not working

   - *Iš
   - Estremadūros
   - sNaująjį
   - MEDŽIAGOTYRA
   - MEDŽIAGOS
   - taškuose

 *Working

   - *garbę
   - ieškoti
   - ispanų

 Any ideas /input  ?

 Regards
 Sujatha



Re: SolrCloud Shardding

2011-07-15 Thread Jamie Johnson
Thanks Shalin.  I don't necessarily have an issue running off this
patch but before I do that or implement my own shardding logic I
wonder if you could let me know your thoughts on the stability of the
patch?  How well it works basically.

On Thu, Jul 14, 2011 at 4:51 PM, Shalin Shekhar Mangar
shalinman...@gmail.com wrote:
 On Thu, Jul 14, 2011 at 12:29 AM, Jamie Johnson jej2...@gmail.com wrote:

 Reading the SolrCloud wiki I see that there are goals to support
 different shardding algorithms, what is currently implemented today?
 Is the shardding logic the responsibility of the application doing the
 index?


 Nothing has been committed to trunk yet. So, right now, sharding is the
 responsibility of the client.

 You may want to follow the jira issue:
 https://issues.apache.org/jira/browse/SOLR-2341

 --
 Regards,
 Shalin Shekhar Mangar.



Need Suggestion

2011-07-15 Thread Rohit Gupta
I am facing some performance issues on my Solr Installation (3core server). I 
am 
indexing live twitter data based on certain keywords, as you can imagine, the 
rate at which documents are received is very high and so the updates to the 
core 
is very high and regular. Given below are the document size on my three core.

Twitter  - 26874747
Core2-  3027800
Core3-  6074253

My Server configuration has 8GB RAM, but now we are experiencing server 
performance drop. What can be done to improve this?  Also, I have a few 
questions.

1. Does the number of commit takes high memory? Will reducing the 
number of 
commits per hour help?
2. Most of my queries are field or date faceting based? how to improve 
those?

Regards,
Rohit

Need Suggestion

2011-07-15 Thread Rohit
I am facing some performance issues on my Solr Installation (3core server).
I am indexing live twitter data based on certain keywords, as you can
imagine, the rate at which documents are received is very high and so the
updates to the core is very high and regular. Given below are the document
size on my three core.

 

Twitter  - 26874747

Core2-  3027800

Core3-  6074253

 

My Server configuration has 8GB RAM, but now we are experiencing server
performance drop. What can be done to improve this?  Also, I have a few
questions.

 

1.  Does the number of commit takes high memory? Will reducing the
number of commits per hour help?
2.  Most of my queries are field or date faceting based? how to improve
those?

 

Regards,

Rohit

 

 

Regards,

Rohit

Mobile: +91-9901768202

About Me:  http://about.me/rohitg http://about.me/rohitg

 



Re: Need Suggestion

2011-07-15 Thread Mohammad Shariq
below are  certain things to do for search latency.
1) Do bulk insert.
2) Commit after every ~5000 docs.
3) Do optimization once in a day.
4) in search query  use fq parameter.

What is the size of JVM you are using ???





On 15 July 2011 17:44, Rohit ro...@in-rev.com wrote:

 I am facing some performance issues on my Solr Installation (3core server).
 I am indexing live twitter data based on certain keywords, as you can
 imagine, the rate at which documents are received is very high and so the
 updates to the core is very high and regular. Given below are the document
 size on my three core.



 Twitter  - 26874747

 Core2-  3027800

 Core3-  6074253



 My Server configuration has 8GB RAM, but now we are experiencing server
 performance drop. What can be done to improve this?  Also, I have a few
 questions.



 1.  Does the number of commit takes high memory? Will reducing the
 number of commits per hour help?
 2.  Most of my queries are field or date faceting based? how to improve
 those?



 Regards,

 Rohit





 Regards,

 Rohit

 Mobile: +91-9901768202

 About Me:  http://about.me/rohitg http://about.me/rohitg






-- 
Thanks and Regards
Mohammad Shariq


Re: SolrJ Collapsable Query Fails

2011-07-15 Thread Ahmet Arslan


 Thanks for the information. However, I still have one more
 problem. I am
 iterating over the values of the NamedList. I have 2
 values, one
 being 'responseHeader' and the other one being 'grouped'. I
 would like to
 access some information stored within the grouped section,
 which has
 data structured like so:
 
 grouped={attr_directory={matches=4,groups=[{groupValue=C:\Users\rvassallo\Desktop\Index,doclist={numFound=2,start=0,docs=[SolrDocument[{attr_meta=[Author,
 kcook, Last-Modified, 2011-03-02T14:14:18Z...
 
 With the 'get(group)' method I am only able to access the
 entire
 '{attr_directory={matches=4,g...' section. Is there some
 functionality which
 allows me to get other data? Something like this for
 instance:
 'get(group.matches)' or maybe
 'get(group.attr_directory.matches)' (which
 will yield the value of 4), or do I need to process the
 String that the
 'get(...)' returns to get what I need?
 
 Thanks :)

I think accessing the relevant portion in a NamedList is troublesome. I suggest 
you to look at existing codes in solrJ. e.g. How facet info is extracted from 
NamedList.

I am sending you the piece of code that I used to access grouped info.
Hopefully It can give you some idea.

 NamedList signature = (NamedList) groupedInfo.get(attr_directory);

if (signature == null) return new ArrayList(0);

matches.append(signature.get(matches));


@SuppressWarnings(unchecked)
ArrayListNamedList groups = (ArrayListNamedList) 
signature.get(groups);

ArrayList resultItems = new ArrayList(groups.size());

StringBuilder builder = new StringBuilder();


for (NamedList res : groups) {

  ResultItem resultItem = null;

  String hash = null;
  Integer found = null;
  for (int i = 0; i  res.size(); i++) {
String n = res.getName(i);

Object o = res.getVal(i);

if (groupValue.equals(n)) {
  hash = (String) o;
} else if (doclist.equals(n)) {
  DocList docList = (DocList) o;
  found = docList.matches();

  try {
final SolrDocumentList list = 
SolrPluginUtils.docListToSolrDocumentList(docList, searcher, fields, null);
builder.setLength(0);

if (list.size()  0)
  resultItem = solrDocumentToResultItem(list.get(0), debug);

for (final SolrDocument document : list)
  builder.append(document.getFieldValue(id)).append(',');


  } catch (final IOException e) {
LOG.error(Unexpected Error, e);
  }
}


  }

  if (found != null  found  1  resultItem != null) {
resultItem.setHash(hash);
resultItem.setFound(found);
builder.setLength(builder.length() - 1);
resultItem.setId(builder.toString());
  }

  // debug


  resultItems.add(resultItem);
}

return resultItems;


Re: deletedPkQuery fails

2011-07-15 Thread Juan Grande
Hi Elaine, I think you have a syntax error in your query. I'd recommend you
to first try the query using a SQL client, until you get it right.

This part seems strange to me:

and pl.deleted='' having count(*)=0

*Juan*



On Wed, Jul 13, 2011 at 5:09 PM, Elaine Li elaine.bing...@gmail.com wrote:

 Hi Folks,

 I am trying to use the deletedPkQuery to enable deltaImport to remove
 the inactive products from solr.
 I am keeping getting the syntax error saying the query syntax is not
 right. I have tried many alternatives to the following query. Although
 all of them work in the mysql prompt directly, no one works in solr
 handler. Can anyone give me some hint to debug this type of problem?
 Is there anything special about deletedPkQuery I am not aware of?

 deletedPkQuery=select p.pId as id from products p join products_large
 pl on p.pId=pl.pId where p.pId= ${dataimporter.delta.id} and
 pl.deleted='' having count(*)=0

 Jul 13, 2011 4:02:23 PM
 org.apache.solr.handler.dataimport.DataImporter doDeltaImport
 SEVERE: Delta Import Failed
 org.apache.solr.handler.dataimport.DataImportHandlerException: Unable
 to execute query: select p.pId as id from products p join
 products_large pl on p.pId=pl.pI
 d where p.pId=  and pl.deleted='' having count(*)=0 Processing Document # 1
at
 org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
at
 org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:253)
at
 org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210)
at
 org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:39)
at
 org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:58)
at
 org.apache.solr.handler.dataimport.SqlEntityProcessor.nextDeletedRowKey(SqlEntityProcessor.java:91)
at
 org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextDeletedRowKey(EntityProcessorWrapper.java:258)
at
 org.apache.solr.handler.dataimport.DocBuilder.collectDelta(DocBuilder.java:636)
at
 org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:258)
at
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:172)
at
 org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:352)
at
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:391)
at
 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370)
 Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException:
 You have an error in your SQL syntax; check the manual that
 corresponds to your MySQL serv
 er version for the right syntax to use near 'and pl.deleted='' having
 count(*)=0' at line 1
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
 Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown
 Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown
 Source)
at java.lang.reflect.Constructor.newInstance(Unknown Source)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:407)
at com.mysql.jdbc.Util.getInstance(Util.java:382)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1052)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3603)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3535)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1989)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2150)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2620)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2570)
at com.mysql.jdbc.StatementImpl.execute(StatementImpl.java:779)
at com.mysql.jdbc.StatementImpl.execute(StatementImpl.java:622)
at
 org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:246)

 Elaine



High Query Volume

2011-07-15 Thread Sheetal
Hello,

I am using solr MoreLikeThis for finding similar result. I have all data
indexed into my Solr server. And the indexed data is also too huge. The data
ranges to millions. 

What I am trying to do is, given the ID, it should check the contents of
that respective ID and give me the result similar to the contents of that
ID.

What my problem is , since the contents of the ID is too large. The  term
vectors/Term frequency becomes too huge. Also the maximum number of query
terms (mlt.maxqt) that will be included in generated query depends upon the
ID , as some contents are hundreds and some are millions. As I have the ID
and its contents, I can find and pass the mlt.maxqt depending upon the ID.
So, depending upon the contents, my query limit is  sometimes mlt.maxqt=100
, sometimes mlt.maxqt=1000, and sometimes even mlt.maxqt=10……..

If my mlt.maxqt=100 , then the result comes pretty fast. But when its
mlt.maxqt=1000 or more, its too too slow obviously…... Is there any way that
I can use to solve this issue...scale solr in anyway…Is there any way that I
can handle huge Query volume in searching. I know the default query term is
25 but I need lot more than that. Am I using the right tool(solr
morelikethis this) ? 

Also, I have my solr running with 2GB and my application running with 2GB.

Any thoughts and help would be real helpful.Thank you in advance.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/High-Query-Volume-tp3172274p3172274.html
Sent from the Solr - User mailing list archive at Nabble.com.


DIH full-import - when is commit() actally triggered?

2011-07-15 Thread Frank Wesemann

Hello,
I am running a full import with a quite plain data-config (a root entity 
with three sub entities ) from a jdbc datasource.

This import is expected to add approximately 10 mio documents
What I now see from my logfiles is, that a newSearcher event is fired 
about every five seconds.

This causes a lot load on the machine.
While searching *:* via the admin interface it appears, that on every 
new commit about 1.000 docs are newly added.
This the batchSize I configured in the datasource definition, but I 
don't think that this related.

in solrconfig I have

updateHandler class=solr.DirectUpdateHandler2 enable=true
   maxPendingDeletes10/maxPendingDeletes
   autoCommit
   maxDocs10/maxDocs  !-- maximum uncommited docs before 
autocommit triggered --
   maxTime30/maxTime
/autoCommit
/updateHandler


What other parameters in solrconfig.xml or in my data-config may be 
related to this behaviour?

Any hint is appreciated.

Thanks
frank

--
mit freundlichem Gruß,

Frank Wesemann
Fotofinder GmbH USt-IdNr. DE812854514
Software EntwicklungWeb: http://www.fotofinder.com/
Potsdamer Str. 96   Tel: +49 30 25 79 28 90
10785 BerlinFax: +49 30 25 79 28 999

Sitz: Berlin
Amtsgericht Berlin Charlottenburg (HRB 73099)
Geschäftsführer: Ali Paczensky





Data Import from a Queue

2011-07-15 Thread Brandon Fish
Does anyone know of any existing examples of importing data from a queue
into Solr?

Thank you.


RE: ' invisible ' words

2011-07-15 Thread Jagdish Vasani
Hi deniz 

You can use luke ( http://www.getopt.org/luke/) and see how that field is 
indexed..which words are there in that field. That may help you figure out how 
you indexed you field.

Thanks.
Jagdish



-Original Message-
From: deniz [mailto:denizdurmu...@gmail.com] 
Sent: Thursday, July 14, 2011 2:57 PM
To: solr-user@lucene.apache.org
Subject: Re: ' invisible ' words

well i know it is totally weird... i have tried many things , including the
ones in this forum, but the result is the same... somehow some words are
just invisible... 



-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: 
http://lucene.472066.n3.nabble.com/invisible-words-tp3158060p3168598.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Is it possible to extract all the tokens from solr?

2011-07-15 Thread Jagdish Vasani
Check the LukeRequestHandler at - http://wiki.apache.org/solr/LukeRequestHandler

This will give you all you need.

Thanks,
Jagdish



-Original Message-
From: pravesh [mailto:suyalprav...@yahoo.com] 
Sent: Thursday, July 14, 2011 2:50 PM
To: solr-user@lucene.apache.org
Subject: Re: Is it possible to extract all the tokens from solr?

You can use lucene for doing this. It provides TermEnum API to enumerate all
terms of field(s).
 SOLR-1.4.+ also provides a special request handler for this purpose. Check
it if that helps

Thanx
Pravesh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-it-possible-to-extract-all-the-tokens-from-solr-tp3168362p3168589.html
Sent from the Solr - User mailing list archive at Nabble.com.


Max Rows

2011-07-15 Thread Alejandro Delgadillo
Hi guys!

For the past year I¹ve been using Solr with Coldfusion as a search engine
for a Library, so far so good I¹ve managed to index different collections
using only the 5 custom fields (category, custom1 ... 5)  available, since
It has worked so good I decided to use Solr to make something like a General
Search of all the collections, what I did was that in the cfsearch tag
under the ³collection² attribute I separated all of the collections using a
comma,

Works like a charm, but there is one problem, the maxrows attribute is set
to 10, this means 10 results per page, but when you put several collections
in the attribute, what it does is that adds 10 results per page per
collections, so if a have 4 comma separeted collections the max rows forces
itself to 40 results per page instead of the 10 total I¹m aiming to.

My question is: Is there a way to fix this? Is there a way to make the
maxrow attribute global and prevent it to add more rows per collection?

Thanks in advance.
Alex.


Re: Need Suggestion

2011-07-15 Thread Rohit Gupta
I am using -Xms2g and -Xmx6g 

What would be the ideal JVM size?

Regards,
Rohit




From: Mohammad Shariq shariqn...@gmail.com
To: solr-user@lucene.apache.org
Sent: Fri, 15 July, 2011 7:27:38 PM
Subject: Re: Need Suggestion

below are  certain things to do for search latency.
1) Do bulk insert.
2) Commit after every ~5000 docs.
3) Do optimization once in a day.
4) in search query  use fq parameter.

What is the size of JVM you are using ???





On 15 July 2011 17:44, Rohit ro...@in-rev.com wrote:

 I am facing some performance issues on my Solr Installation (3core server).
 I am indexing live twitter data based on certain keywords, as you can
 imagine, the rate at which documents are received is very high and so the
 updates to the core is very high and regular. Given below are the document
 size on my three core.



 Twitter  - 26874747

 Core2-  3027800

 Core3-  6074253



 My Server configuration has 8GB RAM, but now we are experiencing server
 performance drop. What can be done to improve this?  Also, I have a few
 questions.



 1.  Does the number of commit takes high memory? Will reducing the
 number of commits per hour help?
 2.  Most of my queries are field or date faceting based? how to improve
 those?



 Regards,

 Rohit





 Regards,

 Rohit

 Mobile: +91-9901768202

 About Me:  http://about.me/rohitg http://about.me/rohitg






-- 
Thanks and Regards
Mohammad Shariq


Re: DIH full-import - when is commit() actally triggered?

2011-07-15 Thread Ahmet Arslan

 I am running a full import with a quite plain data-config
 (a root entity with three sub entities ) from a jdbc
 datasource.
 This import is expected to add approximately 10 mio
 documents
 What I now see from my logfiles is, that a newSearcher
 event is fired about every five seconds.

This is triggered by autoCommit in every 300,000 milli seconds.
You need to remove maxTime30/maxTime to disable this mechanism.




Re: How to use solr.PatternReplaceFilterFactory with ampersand in pattern

2011-07-15 Thread M Singh
That works.  Thanks.


From: Markus Jelsma markus.jel...@openindex.io
To: solr-user@lucene.apache.org
Cc: M Singh mans6si...@yahoo.com
Sent: Thu, July 14, 2011 4:37:57 PM
Subject: Re: How to use solr.PatternReplaceFilterFactory with ampersand in 
pattern

You're in XML so you must escape it properly with amp; etc.
 Hi:
 
 I am using the solr.PatternReplaceFilterFactory with pattern as follows to
 escape ampersand and $ signs:
 
 filter class=solr.PatternReplaceFilterFactory pattern=()
 replacement= /
 
 I am getting error due to embedded ampersand
 
 [Fatal Error] schema.xml:82:71: The entity name must immediately follow the
 '' in the entity reference.
 Exception in thread main org.xml.sax.SAXParseException: The entity name
 must immediately follow the '' in the entity reference.
 at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
 
 
 Is there anyway to make it work ?
 
 Appreciate your help.
 
 Thanks.


Re: Solr Ecosystem / Integration wiki pages

2011-07-15 Thread Chris Hostetter

: integrating Solr with other applications.  What isn't there is a list of
: what web/email/file crawlers exist, data integration pipelines, and there
: are some other odds and ends like distributions/forks of Solr (Lucid 
: Constellio), and Solandra.So I started to put together this page:
: http://wiki.apache.org/solr/SolrEcosystem  instead of essentially
: duplicating what's on SolrIntegration I linked to it.  I suspect that some
: might feel that all this information should live on SolrIntegration and so I
: should move this.  Yes?  I really liked the idea of naming this Solr
: Ecosystem but I admit that when it comes down to it, it's basically about
: integrating with Solr.  
: 
: Any thoughts on this from anyone?

Looks fine to me.

I think it makes sense to have a distinction between the ecosystem of 
tools and such that might be of interest to people using Solr (which may 
or may not know about solr directly), and tools that exist specificly to 
integrate Solr with other things.  

I updated both pages to try and clarify their purpose.

One thing that would be nice on hte EcoSystem page is to better call out 
when/how these things can be used with solr by linking to info about that 
rather then just putting a *S* next to them -- if there isn't a document 
somewhere on those sites mentioning Solr, then claiming they have some 
level of Solr integration is kind of missleading.



-Hoss


Re: Extending Solr Highlighter to pull information from external source

2011-07-15 Thread Jamie Johnson
Boy it's been a long time since I first wrote this, sorry for the delay

I think I have this working as I expect with a test implementation.  I
created the following interface

public interface SolrExternalFieldProvider extends NamedListInitializedPlugin {
public String[] getFieldContent(String key, SchemaField field,
SolrQueryRequest request);
}

I then added to DefaultSolrHighlighter the following:

in init()

SolrExternalFieldProvider defaultProvider =
solrCore.initPlugins(info.getChildren(externalFieldProvider) ,
externalFieldProviders,SolrExternalFieldProvider.class,null);
if(defaultProvider != null){
externalFieldProviders.put(, defaultProvider);
externalFieldProviders.put(null, defaultProvider);
}
then in doHighlightByHighlighter I added the following

if(schemaField != null  !schemaField.stored()){
SolrExternalFieldProvider externalFieldProvider =
this.getExternalFieldProvider(fieldName, params);
if(externalFieldProvider != null){
SchemaField keyField = schema.getUniqueKeyField();
String key = doc.getValues(keyField.getName())[0];  //I
know this field exists and is not multivalued
if(key != null  key.length()  0){
docTexts = externalFieldProvider.getFieldContent(key,
schemaField, req);
}
} else {
docTexts = new String[]{};
}
}

else {
docTexts = doc.getValues(fieldName);
}


This worked for me.  I needed to include the req because there are
some additional thing that I need to have from it, I figure this is
probably something else folks will need as well.  I tried to follow
the pattern used for the other highlighter pieces in that you can have
different externalFieldProviders for each field.  I'm more than happy
to share the actual classes with the community or add them to one of
the JIRA issues mentioned below, I haven't done so yet because I don't
know how to build patches.

On Mon, Jun 20, 2011 at 11:47 PM, Michael Sokolov soko...@ifactory.com wrote:
 I found https://issues.apache.org/jira/browse/SOLR-1397 but there is not
 much going on there

 LUCENE-1522 https://issues.apache.org/jira/browse/LUCENE-1522has a lot of
 fascinating discussion on this topic though


 There is a couple of long lived issues in jira for this (I'd like to try
 to search
 them, but I couldn't access jira now).

 For FVH, it is needed to be modified at Lucene level to use external data.

 koji

 Koji - is that really so?  It appears to me that would could extend
 BaseFragmentsBuilder and override

 createFragments(IndexReader reader, int docId,
      String fieldName, FieldFragList fieldFragList, int maxNumFragments,
      String[] preTags, String[] postTags, Encoder encoder )

 providing a version that retrieves text from some external source rather
 than from Lucene fields.

 It sounds to me like a really useful modification in Lucene core would be to
 retain match points that have already been computed during scoring so the
 highlighter doesn't have to attempt to reinvent all that logic!  This has
 all been discussed at length in LUCENE-1522 already, but is there is any
 recent activity?

 My hope is that since (at least in my test) search code seems to spend 80%
 of its time highlighting, folks will take up this banner and do the plumbing
 needed to improve it - should lead to huge speed-ups for searching!  I'm
 continuing to read, but not really capable of making a meaningful
 contribution at this point.

 -Mike



Re: Solr Ecosystem / Integration wiki pages

2011-07-15 Thread Smiley, David W.
Thanks for offering feedback; if nobody commented I was going to send an FYI 
post to the dev list.
Comments below.

On Jul 15, 2011, at 3:39 PM, Chris Hostetter wrote:

 
 : integrating Solr with other applications.  What isn't there is a list of
 : what web/email/file crawlers exist, data integration pipelines, and there
 : are some other odds and ends like distributions/forks of Solr (Lucid 
 : Constellio), and Solandra.So I started to put together this page:
 : http://wiki.apache.org/solr/SolrEcosystem  instead of essentially
 : duplicating what's on SolrIntegration I linked to it.  I suspect that some
 : might feel that all this information should live on SolrIntegration and so I
 : should move this.  Yes?  I really liked the idea of naming this Solr
 : Ecosystem but I admit that when it comes down to it, it's basically about
 : integrating with Solr.  
 : 
 : Any thoughts on this from anyone?
 
 Looks fine to me.
 
 I think it makes sense to have a distinction between the ecosystem of 
 tools and such that might be of interest to people using Solr (which may 
 or may not know about solr directly), and tools that exist specificly to 
 integrate Solr with other things.  
 
 I updated both pages to try and clarify their purpose.

I noticed your change on IntegratingSolr but not SolrEcosystem, which is still 
at rev#3.

 One thing that would be nice on hte EcoSystem page is to better call out 
 when/how these things can be used with solr by linking to info about that 
 rather then just putting a *S* next to them -- if there isn't a document 
 somewhere on those sites mentioning Solr, then claiming they have some 
 level of Solr integration is kind of missleading.

I agree that adding a link would be helpful.  By the way, every place there is 
an *S* was deliberately placed there by me because I identified the existence 
of Solr-specific integration. Do you believe I misattributed an *S*?

~ David

Re: Extending Solr Highlighter to pull information from external source

2011-07-15 Thread Jamie Johnson
I added the highlighting code I am using to this JIRA
(https://issues.apache.org/jira/browse/SOLR-1397).  Afterwards I
noticed this JIRA (https://issues.apache.org/jira/browse/SOLR-1954)
which talks about another solution.  I think David's patch would have
worked equally well for my problem, just would require later doing the
highlighting on the clients end.  I'll have to give this a whirl over
the weekend.

On Fri, Jul 15, 2011 at 3:55 PM, Jamie Johnson jej2...@gmail.com wrote:
 Boy it's been a long time since I first wrote this, sorry for the delay

 I think I have this working as I expect with a test implementation.  I
 created the following interface

 public interface SolrExternalFieldProvider extends NamedListInitializedPlugin 
 {
        public String[] getFieldContent(String key, SchemaField field,
 SolrQueryRequest request);
 }

 I then added to DefaultSolrHighlighter the following:

 in init()

 SolrExternalFieldProvider defaultProvider =
 solrCore.initPlugins(info.getChildren(externalFieldProvider) ,
 externalFieldProviders,SolrExternalFieldProvider.class,null);
            if(defaultProvider != null){
                externalFieldProviders.put(, defaultProvider);
                externalFieldProviders.put(null, defaultProvider);
            }
 then in doHighlightByHighlighter I added the following

 if(schemaField != null  !schemaField.stored()){
                        SolrExternalFieldProvider externalFieldProvider =
 this.getExternalFieldProvider(fieldName, params);
                        if(externalFieldProvider != null){
                    SchemaField keyField = schema.getUniqueKeyField();
                    String key = doc.getValues(keyField.getName())[0];  //I
 know this field exists and is not multivalued
                    if(key != null  key.length()  0){
                        docTexts = externalFieldProvider.getFieldContent(key,
 schemaField, req);
                    }
                        } else {
                                docTexts = new String[]{};
                        }
                }

                else {
                docTexts = doc.getValues(fieldName);
        }


 This worked for me.  I needed to include the req because there are
 some additional thing that I need to have from it, I figure this is
 probably something else folks will need as well.  I tried to follow
 the pattern used for the other highlighter pieces in that you can have
 different externalFieldProviders for each field.  I'm more than happy
 to share the actual classes with the community or add them to one of
 the JIRA issues mentioned below, I haven't done so yet because I don't
 know how to build patches.

 On Mon, Jun 20, 2011 at 11:47 PM, Michael Sokolov soko...@ifactory.com 
 wrote:
 I found https://issues.apache.org/jira/browse/SOLR-1397 but there is not
 much going on there

 LUCENE-1522 https://issues.apache.org/jira/browse/LUCENE-1522has a lot of
 fascinating discussion on this topic though


 There is a couple of long lived issues in jira for this (I'd like to try
 to search
 them, but I couldn't access jira now).

 For FVH, it is needed to be modified at Lucene level to use external data.

 koji

 Koji - is that really so?  It appears to me that would could extend
 BaseFragmentsBuilder and override

 createFragments(IndexReader reader, int docId,
      String fieldName, FieldFragList fieldFragList, int maxNumFragments,
      String[] preTags, String[] postTags, Encoder encoder )

 providing a version that retrieves text from some external source rather
 than from Lucene fields.

 It sounds to me like a really useful modification in Lucene core would be to
 retain match points that have already been computed during scoring so the
 highlighter doesn't have to attempt to reinvent all that logic!  This has
 all been discussed at length in LUCENE-1522 already, but is there is any
 recent activity?

 My hope is that since (at least in my test) search code seems to spend 80%
 of its time highlighting, folks will take up this banner and do the plumbing
 needed to improve it - should lead to huge speed-ups for searching!  I'm
 continuing to read, but not really capable of making a meaningful
 contribution at this point.

 -Mike




Indexing PDF documents with no UniqueKey

2011-07-15 Thread sabman
I want to index PDF (and other rich) documents. I am using the
DataImportHandler.

Here is how my schema.xml looks:

.
.
 field name=title type=text indexed=true stored=true
multiValued=false/
   field name=description type=text indexed=true stored=true
multiValued=false/
   field name=date_published type=string indexed=false stored=true
multiValued=false/
   field name=link type=string indexed=true stored=true
multiValued=false required=false/
   dynamicField name=attr_* type=textgen indexed=true stored=true
multiValued=false/


uniqueKeylink/uniqueKey


As you can see I have set link as the unique key so that when the indexing
happens documents are not duplicated again. Now I have the file paths stored
in a database and I have set the DataImportHandler to get a list of all the
file paths and index each document. To test it I used the tutorial.pdf file
that comes with  example docs in Solr. The problem is of course this pdf
document won't have a field 'link'. I am thinking of way how I can manually
set the file path as link when indexing these documents. I tried the
data-config settings as below, 

 entity name=fileItems  rootEntity=false dataSource=dbSource
query=select path from file_paths
   entity name=tika-test processor=TikaEntityProcessor
url=${fileItems.path} dataSource=fileSource
 field column=title name=title meta=true/
 field column=Creation-Date name=date_published meta=true/
 entity name=filePath dataSource=dbSource query=SELECT path FROM
file_paths as link where path = '${fileItems.path}'
   field column=link name=link/
 /entity
   /entity
  /entity


where I create a sub-entity which queries for the path name and makes it
return the results in a column titled 'link'. But I still see this error:

WARNING: Error creating document :
SolrInputDocument[{date_published=date_published(1.0)={2011-06-23T12:47:45Z},
title=title(1.0)={Solr tutorial}}]
org.apache.solr.common.SolrException: Document is missing mandatory
uniqueKey field: link

Is there anyway for me to create a field called link for the pdf documents?



This was already asked 
http://lucene.472066.n3.nabble.com/Trouble-with-exception-Document-Null-missing-required-field-DocID-td1641048.html
here  before but the solution provided uses ExtractRequestHandler but I want
to use it through the DataImportHandler.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-PDF-documents-with-no-UniqueKey-tp3173272p3173272.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Extending Solr Highlighter to pull information from external source

2011-07-15 Thread Jamie Johnson
I tried the patch at SOLR-1397 but it didn't work as I'd expect.

lst name=highlighting
lst name=1
arr name=subject_phonetic
stremTest/em subject message/str
/arr
arr name=subject_phonetic_startPosint0/int/arr
arr name=subject_phonetic_endPosint29/int/arr
/lst
/lst
The start position is right, but the end position seems to be the
length of the field.


On Fri, Jul 15, 2011 at 4:25 PM, Jamie Johnson jej2...@gmail.com wrote:
 I added the highlighting code I am using to this JIRA
 (https://issues.apache.org/jira/browse/SOLR-1397).  Afterwards I
 noticed this JIRA (https://issues.apache.org/jira/browse/SOLR-1954)
 which talks about another solution.  I think David's patch would have
 worked equally well for my problem, just would require later doing the
 highlighting on the clients end.  I'll have to give this a whirl over
 the weekend.

 On Fri, Jul 15, 2011 at 3:55 PM, Jamie Johnson jej2...@gmail.com wrote:
 Boy it's been a long time since I first wrote this, sorry for the delay

 I think I have this working as I expect with a test implementation.  I
 created the following interface

 public interface SolrExternalFieldProvider extends 
 NamedListInitializedPlugin {
        public String[] getFieldContent(String key, SchemaField field,
 SolrQueryRequest request);
 }

 I then added to DefaultSolrHighlighter the following:

 in init()

 SolrExternalFieldProvider defaultProvider =
 solrCore.initPlugins(info.getChildren(externalFieldProvider) ,
 externalFieldProviders,SolrExternalFieldProvider.class,null);
            if(defaultProvider != null){
                externalFieldProviders.put(, defaultProvider);
                externalFieldProviders.put(null, defaultProvider);
            }
 then in doHighlightByHighlighter I added the following

 if(schemaField != null  !schemaField.stored()){
                        SolrExternalFieldProvider externalFieldProvider =
 this.getExternalFieldProvider(fieldName, params);
                        if(externalFieldProvider != null){
                    SchemaField keyField = schema.getUniqueKeyField();
                    String key = doc.getValues(keyField.getName())[0];  //I
 know this field exists and is not multivalued
                    if(key != null  key.length()  0){
                        docTexts = externalFieldProvider.getFieldContent(key,
 schemaField, req);
                    }
                        } else {
                                docTexts = new String[]{};
                        }
                }

                else {
                docTexts = doc.getValues(fieldName);
        }


 This worked for me.  I needed to include the req because there are
 some additional thing that I need to have from it, I figure this is
 probably something else folks will need as well.  I tried to follow
 the pattern used for the other highlighter pieces in that you can have
 different externalFieldProviders for each field.  I'm more than happy
 to share the actual classes with the community or add them to one of
 the JIRA issues mentioned below, I haven't done so yet because I don't
 know how to build patches.

 On Mon, Jun 20, 2011 at 11:47 PM, Michael Sokolov soko...@ifactory.com 
 wrote:
 I found https://issues.apache.org/jira/browse/SOLR-1397 but there is not
 much going on there

 LUCENE-1522 https://issues.apache.org/jira/browse/LUCENE-1522has a lot of
 fascinating discussion on this topic though


 There is a couple of long lived issues in jira for this (I'd like to try
 to search
 them, but I couldn't access jira now).

 For FVH, it is needed to be modified at Lucene level to use external data.

 koji

 Koji - is that really so?  It appears to me that would could extend
 BaseFragmentsBuilder and override

 createFragments(IndexReader reader, int docId,
      String fieldName, FieldFragList fieldFragList, int maxNumFragments,
      String[] preTags, String[] postTags, Encoder encoder )

 providing a version that retrieves text from some external source rather
 than from Lucene fields.

 It sounds to me like a really useful modification in Lucene core would be to
 retain match points that have already been computed during scoring so the
 highlighter doesn't have to attempt to reinvent all that logic!  This has
 all been discussed at length in LUCENE-1522 already, but is there is any
 recent activity?

 My hope is that since (at least in my test) search code seems to spend 80%
 of its time highlighting, folks will take up this banner and do the plumbing
 needed to improve it - should lead to huge speed-ups for searching!  I'm
 continuing to read, but not really capable of making a meaningful
 contribution at this point.

 -Mike





Re: SolrCloud Shardding

2011-07-15 Thread Shalin Shekhar Mangar
On Fri, Jul 15, 2011 at 4:51 PM, Jamie Johnson jej2...@gmail.com wrote:

 Thanks Shalin.  I don't necessarily have an issue running off this
 patch but before I do that or implement my own shardding logic I
 wonder if you could let me know your thoughts on the stability of the
 patch?  How well it works basically.


To be frank, I've no idea. This is just the beginning of this feature so you
have to assume that the final result that goes into Solr can be very
different.

-- 
Regards,
Shalin Shekhar Mangar.


Analysis page output vs. actually getting search matches, a discrepency?

2011-07-15 Thread Robert Petersen
I a problem searching for one mfg name (out of our 10mm product titles)
and it is indexed in a text type field  having about the same analyzer
settings as the solr example text field definition, and most everything
works fine but we found this one example which I cannot get a direct hit
on.  In the Field Analysis page, It sure looks like it would *have* to
match but sadly during searches it just doesn't.  I can get it to match
by turning off 'split on case change' but that breaks many other
searches like 'appleTV' which need to split on case change to match
'apple tv' in our content!

 

If I search for SterlingTek's anything I get zero results.

If I change the casing to Sterlingtek's in my query, I get all the
results.

If I turn off 'split on case change then the first gets results also.

 

See verbose analysis output to see actual filter settings, I put
non-verbose first for easier reading (hope the tables don't get lost
during posting to this group) but the analysis shows complete matchup,
that is what I don't get:

 

Field Analysis

Top of Form

Field

Field value (Index) 
verbose output  
highlight matches 

SterlingTek's NB-2LH

Field value (Query) 
verbose output 

SterlingTek's NB-2LH

Bottom of Form

Index Analyzer

SterlingTek's

NB-2LH

 

SterlingTek's

NB-2LH

 

SterlingTek's

NB-2LH

 

Sterling

Tek

NB

2

LH

SterlingTek

 

sterling

tek

nb

2

lh

sterlingtek

 

sterling

tek

nb

2

lh

sterlingtek

 

sterling

tek

nb

2

lh


sterlingtek

Note every field is highlighted in the last line above meaning all have
a match, right???

Query Analyzer

SterlingTek's

NB-2LH

 

SterlingTek's

NB-2LH

 

SterlingTek's

NB-2LH

 

Sterling

Tek

NB

2

LH

 

sterling

tek

nb

2

lh

 

sterling

tek

nb

2

lh

 

sterling

tek

nb

2

lh

 

 

VERBOSE OUTPUT FOLLOWS:


Index Analyzer


org.apache.solr.analysis.WhitespaceTokenizerFactory {}

term position

1

2

term text

SterlingTek's

NB-2LH

term type

word

word

source start,end

0,13

14,20

payload



org.apache.solr.analysis.SynonymFilterFactory
{synonyms=index_synonyms.txt, expand=true, ignoreCase=true}

term position

1

2

term text

SterlingTek's

NB-2LH

term type

word

word

source start,end

0,13

14,20

payload



org.apache.solr.analysis.StopFilterFactory {words=stopwords.txt,
ignoreCase=true}

term position

1

2

term text

SterlingTek's

NB-2LH

term type

word

word

source start,end

0,13

14,20

payload



org.apache.solr.analysis.WordDelimiterFilterFactory {preserveOriginal=0,
splitOnCaseChange=1, generateNumberParts=1, catenateWords=1,
generateWordParts=1, catenateAll=0, catenateNumbers=1}

term position

1

2

3

4

5

term text

Sterling

Tek

NB

2

LH

SterlingTek

term type

word

word

word

word

word

word

source start,end

0,8

8,11

14,16

17,18

18,20

0,11

payload




org.apache.solr.analysis.LowerCaseFilterFactory {}

term position

1

2

3

4

5

term text

sterling

tek

nb

2

lh

sterlingtek

term type

word

word

word

word

word

word

source start,end

0,8

8,11

14,16

17,18

18,20

0,11

payload




com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory
{protected=protwords.txt}

term position

1

2

3

4

5

term text

sterling

tek

nb

2

lh

sterlingtek

term type

word

word

word

word

word

word

source start,end

0,8

8,11

14,16

17,18

18,20

0,11

payload




org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory {}

term position

1

2

3

4

5

term text

sterling

tek

nb

2

lh

sterlingtek

term type

word

word

word

word

word

word

source start,end

0,8

8,11

14,16

17,18

18,20

0,11

payload




Query Analyzer


org.apache.solr.analysis.WhitespaceTokenizerFactory {}

term position

1

2

term text

SterlingTek's

NB-2LH

term type

word

word

source start,end

0,13

14,20

payload



org.apache.solr.analysis.SynonymFilterFactory
{synonyms=query_synonyms.txt, expand=true, ignoreCase=true}

term position

1

2

term text

SterlingTek's

NB-2LH

term type

word

word

source start,end

0,13

14,20

payload



org.apache.solr.analysis.StopFilterFactory {words=stopwords.txt,
ignoreCase=true}

term position

1

2

term text

SterlingTek's

NB-2LH

term type

word

word

source start,end

0,13

14,20

payload



org.apache.solr.analysis.WordDelimiterFilterFactory {preserveOriginal=0,
splitOnCaseChange=1, generateNumberParts=1, catenateWords=0,
generateWordParts=1, catenateAll=0, catenateNumbers=0}

term position

1

2

3

4

5

term text

Sterling

Tek

NB

2

LH

term type

word

word

word

word

word

source start,end

0,8

8,11

14,16

17,18

18,20

payload




Re: Analysis page output vs. actually getting search matches, a discrepency?

2011-07-15 Thread Chris Hostetter

: Subject: Analysis page output vs. actually getting search matches,
: a discrepency?

99% of the time when people ask questions like this, it's because of 
confusion about how/when QueryParsing comes into play (as opposed to 
analysis) -- analysis.jsp only shows you part of the equation, it doesn't 
know what query parser you are using.

you mentioned that you aren't getting matches when you expect them, and 
you provided the analysis.jsp output, but you didn't mention anything 
about the request you are making, the query parser used etc  it owuld 
be good to know the full query URL, along with the debugQuery output 
showing the final query toString info.

if that info doesn't clear up the discrepency, you should also take a look 
at the explainOther info for the doc that you expect to match that isn't 
-- if you still aren't sure what's going on, post all of that info to 
solr-user and folks can probably help you make sense of it.

(all that said: in some instances this type of problem is simply that 
someone changed the schema and didn't reindex everything, so the indexed 
terms don't really match what you think they do)


-Hoss


Re: Getting the indexed value rather than the stored value

2011-07-15 Thread Chris Hostetter

: However, when I get the value of the field from a Solr query, I get the
: original sentence (some sentence like this) which is not what I want (in
: this particular case).

the stored field is allways the original stored value -- analysis is only 
used for producing the indexed terms.

: For now, i ended up creating a custom updateprocessor and configured it in
: solrconfig.xml, but I would still like to know if there's a way through the
: SOLR API to get the actual indexed value (like the way the SOLR api does it)

an updateprocessor is definiltey the right way to go about a problem 
like this.

Solr actually doesn't have an efficient way to get the indexed values for 
a document, the very nature of hte indexed values is that they are an 
*inverted* index -- it's efficient to go from indexed term - doc, not the 
other way arround.

The caveat to this is that things like the FieldCache and UnInvertedField 
can be used internally for fast lookup of indexed terms but they have 
heavy initialization cost to build up these data structures for each 
newSearcher.

Bottom line: an updateprocessor (or generating this value in your indexing 
code) is the way to go.

-Hoss


RE: Analysis page output vs. actually getting search matches, a discrepency?

2011-07-15 Thread Robert Petersen
Hi Chris, 

Well to start from the bottom of your list there, I restrict my testing
to one sku while continuously reindexing the sku after every indexer
side change, and reload the core every time also.  I just search from
the admin page using the word in question and the exact match on the sku
field (the unique one) like this:
response
lst name=responseHeader
int name=status0/int
int name=QTime6/int
lst name=params
str name=indenton/str
str name=start0/str
str name=qSterlingTek's NB-2LH sku:216473417/str
str name=bbba/str
str name=rows10/str
str name=version2.2/str
/lst
/lst

I will have to find out more about query parsers before I can answer the
rest, Will reply to that later... and it's Friday after all!  :)

Thanks


-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
Sent: Friday, July 15, 2011 4:36 PM
To: solr-user@lucene.apache.org
Subject: Re: Analysis page output vs. actually getting search matches, a
discrepency?


: Subject: Analysis page output vs. actually getting search matches,
: a discrepency?

99% of the time when people ask questions like this, it's because of 
confusion about how/when QueryParsing comes into play (as opposed to 
analysis) -- analysis.jsp only shows you part of the equation, it
doesn't 
know what query parser you are using.

you mentioned that you aren't getting matches when you expect them, and 
you provided the analysis.jsp output, but you didn't mention anything 
about the request you are making, the query parser used etc  it
owuld 
be good to know the full query URL, along with the debugQuery output 
showing the final query toString info.

if that info doesn't clear up the discrepency, you should also take a
look 
at the explainOther info for the doc that you expect to match that isn't

-- if you still aren't sure what's going on, post all of that info to 
solr-user and folks can probably help you make sense of it.

(all that said: in some instances this type of problem is simply that 
someone changed the schema and didn't reindex everything, so the indexed

terms don't really match what you think they do)


-Hoss


Index rows with NULL value

2011-07-15 Thread Ruixiang Zhang
Hi

It seems that solr does not index a row when some column of this row has
NULL value.
How can I make solr index these rows?

Thanks
Ruixiang


how to get one word frequency from a document

2011-07-15 Thread Allen
Hi All,

I am trying to use TermVectorComponent to get the word frequency from
a particular document. Here is the url I used:
q=someword+id%3Asomedocqt=tvrhtv.all=true. But the result
includes all the words' frequency in that document. Are there any
query filters or request parameters that I can use to get one
particular word's frequency from a particular document?

Thanks a lot.

-- 
Allen


Re: POST VS GET and NON English Characters

2011-07-15 Thread Sujatha Arun
It works fine with GET method ,but I am wondering why it does not with POST
method.

2011/7/15 pankaj bhatt panbh...@gmail.com

 Hi Arun,
  This looks like an Encoding issue to me.
   Can you change your browser settinsg to UTF-8 and hit the search url
 via GET method.

We faced the similar problem with chienese,korean languages, this
 solved the problem.

 / Pankaj Bhatt.

 2011/7/15 Sujatha Arun suja.a...@gmail.com

  Hello,
 
  We have implemented solr search in  several languages .Intially we used
 the
  GET method for querying ,but later moved to  POST method to
 accomodate
  lengthy queries .
 
  When we moved form  GET TO POSt method ,the german characteres could no
  longer be searched and I had to use the fucntion utf8_decode in my
  application  for the search to work for german characters.
 
  Currently I am doing this  while quering using the POST method ,we are
  using
  the standard Request Handler
 
 
  $this-_queryterm=iconv(UTF-8, ISO-8859-1//TRANSLIT//IGNORE,
  $this-_queryterm);
 
 
  This makes the query work for german characters and other languages but
  does
  not work for certain charactes  in Lithuvanian and spanish.Example:
  *Not working
 
- *Iš
- Estremadūros
- sNaująjį
- MEDŽIAGOTYRA
- MEDŽIAGOS
- taškuose
 
  *Working
 
- *garbę
- ieškoti
- ispanų
 
  Any ideas /input  ?
 
  Regards
  Sujatha