Fwd: DIH throws NullPointerException when using dataimporter.functions.escapeSql with parent entities

2012-10-19 Thread Dominik Siebel
Hi folks,

I am currently migrating our Solr servers from a 4.0.0 nightly build
(aprox. November 2011, which worked very well) to the newly released
4.0.0 and am running into some issues concerning the existing
DataImportHandler configuratiions. Maybe you have an idea where I am
going wrong here.

The following lines are a highly simplified excerpt from one of the
problematic imports:

entity name=path rootEntity=false query=SELECT p.id, IF(p.name
IS NULL, '', p.name) AS name FROM path p GROUP BY p.id

entity name=item rootEntity=true query=
SELECT
i.*,

CONVERT('${dataimporter.functions.escapeSql(path.name)}' USING
utf8) AS path_name
FROM items i
WHERE i.path_id = ${path.id} /

/entity

While this configuration worked without any problem for over half a
year now, when upgrading to 4.0.0-BETA AND 4.0.0 the Import throws the
followeing Stacktrace and exits:

 SEVERE: Exception while processing: path document :
null:org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.NullPointerException

which is caused by

Caused by: java.lang.NullPointerException
at 
org.apache.solr.handler.dataimport.EvaluatorBag$1.evaluate(EvaluatorBag.java:79)

In other words: The EvaluatorBag doesn't seem to resolve the given
path.name variable properly and returns null.

Does anyone have any idea?
Appreciate your input!

Regards
Dom


Re: Solr 4.0 Master slave configuration in JBOSS 5.1.2

2012-10-19 Thread adityab
Can you please share some information on Setting up Solr 4.0 as a singleCore. 
I tried doing it and keep seeing ClassNotFound Exception for
KeywordTokenizerFactory. on server start up. 

I see the jar files being loaded in the logs but its unable to find the
class. 
Can you let me know what jars reside in your Solr Home lib folder?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-0-Master-slave-configuration-in-JBOSS-5-1-2-tp3988375p4014683.html
Sent from the Solr - User mailing list archive at Nabble.com.


Data Writing Performance of Solr 4.0

2012-10-19 Thread higashihara_hdk
Hello everyone.

I have two questions. I am considering using Solr 4.0 to perform full
searches on the data output in real-time by a Storm cluster
(http://storm-project.net/).

1. In particular, I'm concerned whether Solr would be able to keep up
with the 2000-message-per-second throughput of the Storm cluster. What
kind of throughput would I be able to expect from Solr 4.0, for example
on a Xeon 2.5GHz 4-core with HDD?

2. Also, how efficiently would Solr scale with clustering?

Any pertinent information would be greatly appreciated.

Hideki Higashihara


Re: Even after indexing a mysql table,in solr am not able to retreive data after querying

2012-10-19 Thread Gora Mohanty
On 19 October 2012 12:07, Romita Saha romita.s...@sg.panasonic.com wrote:
[...]
 My data-config file is :

 entity name=camera
 query=SELECT id FROM camera
 field column=id name=id/
 field column=data name=data/


 /entity

 The related schema.xml file is :

 field name=id type=integer indexed=true stored=true
 required=true/
 field name=data type=string indexed=true stored=true
 required=true/

Your data field is required, but you are not SELECTing
it from mysql. You probably want
query=SELECT id, data FROM camera

Regards,
Gora


Re: Even after indexing a mysql table,in solr am not able to retreive data after querying

2012-10-19 Thread Chandan Tamrakar
status shows that all your 4 records were not indexed.

str name=Total Documents Failed4/str



On Fri, Oct 19, 2012 at 12:22 PM, Romita Saha
romita.s...@sg.panasonic.comwrote:

 Hi,

 Even after indexing a mysql table,in solr am not able to retrieve data
 after querying. Here is the status after i run
 http://localhost:8983/solr/db/dataimport

 str name=Indexing completed. Added/Updated: 0 documents. Deleted 0
 documents./str
 str name=Committed2012-10-19 14:31:28/str
 str name=Total Documents Processed0/str
 str name=Total Documents Failed4/str
 str name=Time taken0:0:0.524/str/lst
 str name=WARNINGThis response format is experimental.  It is likely to
 change in the future
 ./str
 /response

 My data-config file is :

 entity name=camera
 query=SELECT id FROM camera
 field column=id name=id/
 field column=data name=data/


 /entity

 The related schema.xml file is :

 field name=id type=integer indexed=true stored=true
 required=true/
 field name=data type=string indexed=true stored=true
 required=true/

 In my database, id is of Type int (11) and data is of Type varchar(100)
 I am new to solr. Could any one please help.

 Thanks and regards,
 Romita Saha




-- 
Chandan Tamrakar
*
*


Re: Building an enterprise quality search engine using Apache Solr

2012-10-19 Thread dirk
Hi,
your question is not easy to answer. It depends on so many things, that
there is no standard way to realize an enterprise solution and time planning
aspects are depending on so much things. 

I can try to give you some brief notes about our solution, but there are
some differences in target group and data source. I am technical responsible
for the system disco (a research and discovery system) at the library at
university of Münster. (excuse me, I don't want to make a promotion tour
here, I earn no money with such activities -:)). Ok, in this search engine,
based on lucene, we search in about 200 Mio Articles, Books, Journals and so
on. So we have different data sources in structure and also in the way of
delivery. At the beginning we thought, lets buy a solution in order to avoid
more or less own developement work. So we bought a commercial search engine,
which works on a lucene core with a proprietary business logic in order to
talk to lucene core. So far so good - or not good. At that time I was the
onliest worker on this project and I need nearly one and a half year in
fulltime in order to fullfill most features and requirements. And the reason
for that long time is not, that I had no exiperiences, (I hope so). I work
in this area nearly 15 years in different companies, always as developer in
J2EE. (That`s rare today, because today every experienced developer wants to
work as leader or manager, that`s sounds better and less project leader
are outsourced. ok, other topic) And other universities (customers) who
realized a comparable search engine in that environment took as long or
longer. So I am hopefully...

In germany we say der teufel steckt im detail (translation literally:
devil is hidden in detail), which means you start work and parallel to that
process mostly requirements changed, sadly in most cases after development
has done the software basis. For example we need a lot of time for the fine
tuning of ranking and for realizing a complete automatic mechanism to update
data sources. And it was one thing to realize the search in development and
run a first developer test, a complete other thing is to make the system fit
for 24/7 service and run a productive system without problems.

Most time we need on data pre-processing because of the shit in - shit out
problem. Work on the quality of data is expensive but you get no
appreciation, because everybody is cope with searching features. This
requirement shows us, that mostly it is impossible to avoid own developement
completely. 
Next thing is user interface, not every feature a customer knows from good
old database backboned systems is easy to realized in a search engine
because of more or less flat data structure. So we had to develop one
service after the other in order to read additional informations. In our
case for example runtime holding informations of our library. 

Summarized, if you want to estimate a concrete time duration in order to
realize a complete productive enterprise search solution, you should talk to
some people with similar solutions, think of your own requirements in detail
and then multiply your estimation with 2. Then perhaps you have a realistic
estimate. 
Dirk   



-
my developer logs 
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Building-an-enterprise-quality-search-engine-using-Apache-Solr-tp4014557p4014688.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: KeeperException (NodeExists for /overseer): SolrCloud Multiple Collections - is it safe ignore these exceptions?

2012-10-19 Thread Jeevanandam Madanagopal
Thanks Mark! 

Cheers, Jeeva

On Oct 19, 2012, at 8:35 AM, Mark Miller markrmil...@gmail.com wrote:

 Yes, those exceptions are fine. These are cases where we try to delete the 
 node if it's there, but don't care if it's not there - things like that. In 
 some of these cases, ZooKeeper logs things we can't stop, even though it's 
 expected that sometimes we will try and remove nodes that are not there or 
 create nodes that are already there.
 
 - Mark
 
 On Thu, Oct 18, 2012 at 9:01 AM, Jeevanandam Madanagopal je...@myjeeva.com 
 wrote:
 Hello -
 
 While doing prototype of SolrCloud with Multiple Collection.  Each collection 
 represents country level data.
 - searching within collection represents country level - local search
 - searching across collection represents global search
 
 Attached the graph image of SolrCoud structure.  For prototype I'm running 
 Embedded ZooKeeper ensemble (5 replicated zookeeper servers).
 - Searching and Indexing in respective collection works well
 - Search across collection works well (for global search)
 
 
 
 
 While joining the 'Collection2' to zookeeper ensemble I noticed the following 
 KeeperException in the logger.
 
 Question 'is it safe to ignore these exceptions?'
 
 Exception Log snippet:
 Oct 18, 2012 4:54:26 PM org.apache.zookeeper.server.NIOServerCnxn$Factory run
 INFO: Accepted socket connection from /fe80:0:0:0:0:0:0:1%1:62700
 Oct 18, 2012 4:54:26 PM org.apache.zookeeper.server.NIOServerCnxn 
 readConnectRequest
 INFO: Client attempting to establish new session at 
 /fe80:0:0:0:0:0:0:1%1:62700
 Oct 18, 2012 4:54:26 PM org.apache.zookeeper.server.NIOServerCnxn 
 finishSessionInit
 INFO: Established session 0x13a73521356000a with negotiated timeout 15000 for 
 client /fe80:0:0:0:0:0:0:1%1:62700
 Oct 18, 2012 4:54:26 PM org.apache.zookeeper.server.PrepRequestProcessor 
 pRequest
 INFO: Got user-level KeeperException when processing 
 sessionid:0x13a73521356000a type:create cxid:0x1 zxid:0xfffe 
 txntype:unknown reqpath:n/a Error Path:/overseer Error:KeeperErrorCode = 
 NodeExists for /overseer
 Oct 18, 2012 4:54:26 PM org.apache.zookeeper.server.PrepRequestProcessor 
 pRequest
 INFO: Got user-level KeeperException when processing 
 sessionid:0x13a73521356000a type:create cxid:0x2 zxid:0xfffe 
 txntype:unknown reqpath:n/a Error Path:/overseer Error:KeeperErrorCode = 
 NodeExists for /overseer
 Oct 18, 2012 4:54:26 PM org.apache.zookeeper.server.PrepRequestProcessor 
 pRequest
 INFO: Got user-level KeeperException when processing 
 sessionid:0x13a73521356000a type:delete cxid:0x4 zxid:0xfffe 
 txntype:unknown reqpath:n/a Error 
 Path:/live_nodes/mac-book-pro.local:7500_solr Error:KeeperErrorCode = NoNode 
 for /live_nodes/mac-book-pro.local:7500_solr
 Oct 18, 2012 4:54:26 PM org.apache.solr.common.cloud.ZkStateReader$3 process
 INFO: Updating live nodes
 
 Cheers, Jeeva
 
 
 
 
 -- 
 - Mark



diversity of search results?

2012-10-19 Thread Paul Libbrecht
Hello SOLR expert,

yesterday in our group we realized that a danger we may need to face is that a 
search result includes very similar results.
Of course, one would expect skimming so that duplicates that show almost the 
same results in a search result would be avoided but we fear that this is not 
possible.

I was wondering if some technology, plugin, or even research was existing that 
would enable a search result to be partially reordered so that diversity is 
ensured for a first page of results at least.

I suppose that might be doable by processing the result page and the next (and 
the five next?) and pushing down some results if they are too similar to 
previous ones.

Hope I am being clear.

Paul

Re: Building an enterprise quality search engine using Apache Solr

2012-10-19 Thread Ahmet Arslan
Hi Alexandre,

Yes it is active. ManifoldCF 1.0.1 is released yesterday :)
You can index content of SharePoint 2010 to Solr 4.0.0 .

'End user documentation' and 'in action book' are two main resources.

http://manifoldcf.apache.org/release/release-1.0.1/en_US/end-user-documentation.html

http://www.manning.com/wright/


--- On Fri, 10/19/12, Alexandre Rafalovitch arafa...@gmail.com wrote:

 From: Alexandre Rafalovitch arafa...@gmail.com
 Subject: Re: Building an enterprise quality search engine using Apache Solr
 To: solr-user@lucene.apache.org
 Date: Friday, October 19, 2012, 7:18 AM
 This is the first time I hear of this
 project. Looks interesting, but
 Is it active?
 
 The integration FAQ seem to be talking about Solr 1.4, a bit
 out of date.
 
 Regards,
    Alex.
 Personal blog: http://blog.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from
 happening all
 at once. Lately, it doesn't seem to be working. 
 (Anonymous  - via GTD
 book)
 
 
 On Fri, Oct 19, 2012 at 12:37 AM, Jack Krupansky
 j...@basetechnology.com
 wrote:
  Take a look at Apache ManifoldCF for crawling
 enterprise repositories such
  as SharePoint (as well as lighterweight web crawling
 and file system
  crawling).
 
  http://manifoldcf.apache.org/en_US/index.html
 
  -- Jack Krupansky
 
  -Original Message- From: Venky Naganathan
  Sent: Thursday, October 18, 2012 2:21 PM
  To: solr-user@lucene.apache.org
  Subject: Building an enterprise quality search engine
 using Apache Solr
 
 
  Hello,
 
  Can some one please provide me advise on the below ?
 
  1) I am considering building an enterprise search
 engine that indexes



Re: diversity of search results?

2012-10-19 Thread dirk
Hi Paul,

yes that`s a typical problem in configuring a search engine. A solution
depends on your data. Sometimes you can overcome this problem by fine tuning
your search engine on boosting level. Thats not easy and always based on
trail and error tests.

Another thing you can do is to try to realize a data pre-processing which
compensate the reasons of similar content in certain fields, e.g. in a title
field. 
For example if you have products with very similar titles and you boost such
a field. The result is, that you always will found all documents in the
result list. But if you go on and add some informations (perhaps out of
other search fields) in this title field you perhaps can reduce the
similarity. (typical example in my branch: Book titles in different volumes,
then I add the volumn  number and der year to the title field.) 

Perhaps it is also necessary to cape with a pre-processed deduplication.
Here you can find an entry point:
http://wiki.apache.org/solr/Deduplication

Dirk

   



-
my developer logs 
--
View this message in context: 
http://lucene.472066.n3.nabble.com/diversity-of-search-results-tp4014692p4014696.html
Sent from the Solr - User mailing list archive at Nabble.com.


Query related to Solr XML

2012-10-19 Thread Leena Jawale

Hi,

I made a Solr XML data source in lucidworks enterprise v2.1. When I search in 
Solr Admin for text. I am unable to get the result.
Could you help me in this?



Thanks  Regards,
Leena Jawale
Software Engineer Trainee
BFS BU
Phone No. - 9762658130
Email - leena.jaw...@lntinfotech.commailto:leena.jaw...@lntinfotech.com



The contents of this e-mail and any attachment(s) may contain confidential or 
privileged information for the intended recipient(s). Unintended recipients are 
prohibited from taking action on the basis of information in this e-mail and 
using or disseminating the information, and must notify the sender and delete 
it from their system. LT Infotech will not accept responsibility or liability 
for the accuracy or completeness of, or the presence of any virus or disabling 
code in this e-mail


Saravanan Chinnadurai/Actionimages is out of the office.

2012-10-19 Thread Saravanan . Chinnadurai
I will be out of the office starting  18/10/2012 and will not return until
23/10/2012.

Please email to itsta...@actionimages.com  for any urgent issues.


Action Images is a division of Reuters Limited and your data will therefore be 
protected
in accordance with the Reuters Group Privacy / Data Protection notice which is 
available
in the privacy footer at www.reuters.com
Registered in England No. 145516   VAT REG: 397000555


Re: Antw: Re: How to retrieve field contents as UTF-8 from Solr-Index with SolrJ

2012-10-19 Thread Andreas Kahl
Fetching the same records using a raw Http-Request works fine and
characters are OK. I am actually considering to fetch the data in Java
via raw Http-Requests + XSLTResponsWriter as a workaround, but I want to
try it first using the 'native' way with SolrJ. 

Andreas
 
 Jack Krupansky j...@basetechnology.com 18.10.2012 21:36  
Have you verified that the data was indexed properly (UTF-8 encoding)?
Try a 
raw HTTP request using the browser or curl and see how that field looks
in 
the resulting XML.

-- Jack Krupansky

-Original Message- 
From: Andreas Kahl
Sent: Thursday, October 18, 2012 1:10 PM
To: j...@basetechnology.com ; solr-user@lucene.apache.org
Subject: Antw: Re: How to retrieve field contents as UTF-8 from
Solr-Index 
with SolrJ

Jack,

Thanks for the hint, but we have already set URIEncoding=UTF-8 on
all
our tomcats, too.

Regards
Andreas

 Jack Krupansky  18.10.12 17.11 Uhr 
It may be that your container does not have UTF-8 enabled. For
example,
with
Tomcat you need something like:



Make sure your Connector element has URIEncoding=UTF-8 (for
Tomcat.)

-- Jack Krupansky

-Original Message- 
From: Andreas Kahl
Sent: Thursday, October 18, 2012 10:53 AM
To: solr-user@lucene.apache.org
Subject: How to retrieve field contents as UTF-8 from Solr-Index with
SolrJ

Hello everyone,

we are trying to implement a simple Servlet querying a Solr 3.5-Index
with SolrJ. The Query we send is an identifier in order to retrieve a
single record. From the result we extract one field to return. This
field contains an XML-Document with characters from several european
and
asian alphabets, so we need UTF-8.

Now we have the problem that the string returned by
marcXml = results.get(0).getFirstValue(marcxml).toString();
is not valid UTF-8, so the resulting XML-Document is not well formed.

Here is what we do in Java:

ModifiableSolrParams params = new ModifiableSolrParams();
params.set(q, query.toString());
params.set(fl, marcxml);
params.set(rows, 1);
try {
QueryResponse result = server.query(params,
SolrRequest.METHOD.POST);
SolrDocumentList results = result.getResults();
if (!results.isEmpty()) {
marcXml =
results.get(0).getFirstValue(marcxml).toString();
}
} catch (Exception ex) {
Logger.getLogger(MarcServer.class.getName()).log(Level.SEVERE,
null, ex);
}


Charset.defaultCharset() is UTF-8 on both, the querying machine and
the Solr-Server. Also we tried BinaryResponseParser as well as
XMLResponseParser when instantiating CommonsHttpSolrServer.

Does anyone have a solution to this? Is this related to
https://issues.apache.org/jira/browse/SOLR-2034 ? Is there
eventually a workaround?

Regards
Andreas





Re: Solr 4.0.0 - index version and generation not changed after delete by query on master

2012-10-19 Thread Erick Erickson
I wonder if you're getting hit by the browser caching the admin page and
serving up the old version? What happens if you try from a different
browser or purge the browser cache?

Of course you have to refresh the master admin page, there's no
automatic update but I assume you did that.

Best
Erick

On Thu, Oct 18, 2012 at 1:59 PM, Bill Au bill.w...@gmail.com wrote:
 Just discovered that the replication admin REST API reports the correct
 index version and generation:

 http://master_host:port/solr/replication?command=indexversion

 So is this a bug in the admin UI?

 Bill

 On Thu, Oct 18, 2012 at 11:34 AM, Bill Au bill.w...@gmail.com wrote:

 I just upgraded to Solr 4.0.0.  I noticed that after a delete by query,
 the index version, generation, and size remain unchanged on the master even
 though the documents have been deleted (num docs changed and those deleted
 documents no longer show up in query responses).  But on the slave both the
 index version, generation, and size are updated.  So I though the master
 and slave were out of sync but in reality that is not true.

 What's going on here?

 Bill



Re: Solr 4.0 segment flush times has bigger difference between tow machines

2012-10-19 Thread Jun Wang
I have found that segment flush is controlled by
DocumentWriterFlushControl, and indexing is implemented by
DocumentWriterPerThread. DocumentWriterFlushControl has information about
number of doc and size of RAM buffer, but this seemed be shared by
all DocumentWriterPerThread. Is that RAM limit is sum of all buffer
of DocumentWriterPerThread?

2012/10/19 Jun Wang wangjun...@gmail.com

 Hi

 I have 2 machine for a collection, and it's using DIH to import data, DIH
 is trigger via url request at one machine, let's call it A, and A will
 forward some index to machine B. Recently I have found that segment flush
 happened more in machine B. here is part of INFOSTREAM.txt.

 Machine A:
 
 DWPT 0 [Thu Oct 18 20:06:20 PDT 2012; Thread-39]: flush postings as
 segment _4r3 numDocs=71616
 DWPT 0 [Thu Oct 18 20:06:21 PDT 2012; Thread-39]: new segment has 0
 deleted docs
 DWPT 0 [Thu Oct 18 20:06:21 PDT 2012; Thread-39]: new segment has no
 vectors; no norms; no docValues; prox; freqs
 DWPT 0 [Thu Oct 18 20:06:21 PDT 2012; Thread-39]:
 flushedFiles=[_4r3_Lucene40_0.prx, _4r3.fdt, _4r3.fdx, _4r3.fnm,
 _4r3_Lucene40_0.tip, _4r3_Lucene40_0.tim, _4r3_Lucene40_0.frq]
 DWPT 0 [Thu Oct 18 20:06:21 PDT 2012; Thread-39]: flushed codec=Lucene40
 D

 Machine B
 --
 DWPT 0 [Thu Oct 18 21:41:22 PDT 2012; http-0.0.0.0-8080-3]: flush postings
 as segment _zi0 numDocs=4302
 DWPT 0 [Thu Oct 18 21:41:22 PDT 2012; http-0.0.0.0-8080-3]: new segment
 has 0 deleted docs
 DWPT 0 [Thu Oct 18 21:41:22 PDT 2012; http-0.0.0.0-8080-3]: new segment
 has no vectors; no norms; no docValues; prox; freqs
 DWPT 0 [Thu Oct 18 21:41:22 PDT 2012; http-0.0.0.0-8080-3]:
 flushedFiles=[_zi0_Lucene40_0.prx, _zi0.fdx, _zi0_Lucene40_0.tim, _zi0.fdt,
 _zi0.fnm, _zi0_Lucene40_0.frq, _zi0_Lucene40_0.tip]
 DWPT 0 [Thu Oct 18 21:41:22 PDT 2012; http-0.0.0.0-8080-3]: flushed
 codec=Lucene40
 D

 I have found that flush occured  when number of doc in RAM reached
 7~9000 in machine A, but the number in machine B is very different,
 almost is 4000.  It seem that every doc in buffer used more RAM in machine
 B then machine A, that result in more flush . Does any one know why this
 happened?

 My conf is here.

 ramBufferSizeMB64/ramBufferSizeMBmaxBufferedDocs10/maxBufferedDocs




 --
 from Jun Wang





-- 
from Jun Wang


SimpleTextCodec usage tips?

2012-10-19 Thread seralf
Hi

does anybody could give some direction / suggestion on how to correctly
configure and use the SimpleTextCodec?
http://lucene.apache.org/core/4_0_0-BETA/core/org/apache/lucene/codecs/simpletext/SimpleTextCodec.html

i'd like to do some test for debugging purpose, but i'm not shure on how to
enable the pluggable codecs interface.

as far as i understand, i have to use the codec factory in the schema.xml,
but i didn't understand where to configure and choice the specific codec.

thank you in advance (sorry if this question was earlier posted, i din't
find any post on that),

Alfredo Serafini


Re: Apache Solr Quiz

2012-10-19 Thread Dmitry Kan
Thanks for the quiz. It is refreshing. Do you plan on covering other parts
of SOLR management, like various handlers, scoring, plugins, sharding etc?

Dmitry

On Wed, Oct 17, 2012 at 7:12 PM, Yulia Crowder yulia.crow...@gmail.comwrote:

 I love Solr!
 I have searched for a quiz about Solr and didn't find any on the net.
 I am pleased to say that I have conducted a Quiz about Solr:

 http://www.quizmeup.com/quiz/apache-solr-configuration

 It is build on a free wiki based quiz site. You can, and welcome to,
 improve my questions and add new questions.
 Hope you find it useful and enjoyable way to learn about Solr.
 Comments?



Re: Query related to Solr XML

2012-10-19 Thread Erik Hatcher
Leena -

It's best to ask LucidWorks related questions at http://support.lucidworks.com 
rather than in this e-mail list.

As for your issue more information is needed in order to assist.  Did you 
start the Solr XML crawler?   Does your data source show that there are 
documents in the index?   If you simply press search (with an empty query) do 
you see documents?   (best, again, to respond to these questions at the 
LucidWorks support site)

Erik


On Oct 19, 2012, at 05:54 , Leena Jawale wrote:

 
 Hi,
 
 I made a Solr XML data source in lucidworks enterprise v2.1. When I search in 
 Solr Admin for text. I am unable to get the result.
 Could you help me in this?
 
 
 
 Thanks  Regards,
 Leena Jawale
 Software Engineer Trainee
 BFS BU
 Phone No. - 9762658130
 Email - leena.jaw...@lntinfotech.commailto:leena.jaw...@lntinfotech.com
 
 
 
 The contents of this e-mail and any attachment(s) may contain confidential or 
 privileged information for the intended recipient(s). Unintended recipients 
 are prohibited from taking action on the basis of information in this e-mail 
 and using or disseminating the information, and must notify the sender and 
 delete it from their system. LT Infotech will not accept responsibility or 
 liability for the accuracy or completeness of, or the presence of any virus 
 or disabling code in this e-mail



Re: Query related to Solr XML

2012-10-19 Thread Otis Gospodnetic
Leena,

Please ask on Lucid fora. You'll get better and faster help there.

Otis
--
Performance Monitoring - http://sematext.com/spm
On Oct 19, 2012 5:54 AM, Leena Jawale leena.jaw...@lntinfotech.com
wrote:


 Hi,

 I made a Solr XML data source in lucidworks enterprise v2.1. When I search
 in Solr Admin for text. I am unable to get the result.
 Could you help me in this?



 Thanks  Regards,
 Leena Jawale
 Software Engineer Trainee
 BFS BU
 Phone No. - 9762658130
 Email - leena.jaw...@lntinfotech.commailto:leena.jaw...@lntinfotech.com


 
 The contents of this e-mail and any attachment(s) may contain confidential
 or privileged information for the intended recipient(s). Unintended
 recipients are prohibited from taking action on the basis of information in
 this e-mail and using or disseminating the information, and must notify the
 sender and delete it from their system. LT Infotech will not accept
 responsibility or liability for the accuracy or completeness of, or the
 presence of any virus or disabling code in this e-mail



Easy question ? docs with empty geodata field

2012-10-19 Thread darul
Hello,

Looking to get all documents with empty geolocalisation field, I have not
found any way to do it, with ['' to *], 

geodata being a specific field, do you have any solution ?

Thanks,

Jul



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Easy-question-docs-with-empty-geodata-field-tp4014751.html
Sent from the Solr - User mailing list archive at Nabble.com.


Getting count for Multi-Select Faceting

2012-10-19 Thread Stephane Gamard
Hi all, 

Congrats on the 4.0.0 delivery, it's a pleasure to work with! 

I have a small problem that I am trying to elegantly resolve: while using 
multi-select faceting it might happen that a facet is selected which is not 
part of the facet list (due to limit for example). When executing the query I 
cannot then get the facet's value count as it still outside of the scope of the 
limit. 

for a sample query: 
http://192.168.160.2:8983/solr/select?fq={!tag=scat}category:Articlefacet.field={!ex=scat}categoryq=*:*facet=truefacet.limit=5facet.mincount=1

I have the following results:

lst name=facet_fields
lst name=category
int name=Organic Papers6225/int
int name=Metal-Organic Papers3055/int
int name=Research Papers236/int
int name=Inorganic Papers187/int
int name=Addenda and Errata59/int
/lst
/lst

Note that the facet (category:Article) is not present within the facet_fields 
result. I've thought of running 2 facet queries where one is not tagged and 
merge the 2 list within the UI. Is that the best solution available, or should 
the facet of fq be present (as sticky) within the facet_list? 

Cheers, 

_Stephane

Re: Easy question ? docs with empty geodata field

2012-10-19 Thread darul
sorry, I mean this field called geodata in my schema

fieldType name=location class=solr.LatLonType
subFieldSuffix=_coordinate/
field name=geodata type=location indexed=true stored=true/



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Easy-question-docs-with-empty-geodata-field-tp4014751p4014752.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Data Writing Performance of Solr 4.0

2012-10-19 Thread Mark Miller
On Fri, Oct 19, 2012 at 2:50 AM, higashihara_hdk
higashihara_...@es-planning.jp wrote:
 Hello everyone.

 I have two questions. I am considering using Solr 4.0 to perform full
 searches on the data output in real-time by a Storm cluster
 (http://storm-project.net/).

 1. In particular, I'm concerned whether Solr would be able to keep up
 with the 2000-message-per-second throughput of the Storm cluster. What
 kind of throughput would I be able to expect from Solr 4.0, for example
 on a Xeon 2.5GHz 4-core with HDD?

It depends on the size of the messages and the analysis you will be applying.

But without any other info, yes, it's possible depending on your data
and how you massage it.


 2. Also, how efficiently would Solr scale with clustering?

That's a pretty general question.


-- 
- Mark


Re: Easy question ? docs with empty geodata field

2012-10-19 Thread Tanguy Moal
Hello,

Did you try q=-geodata:[* TO *] ? (Note the '-' (minus))
This reads as documents without any value for field named geodata.

Also if you plan to use this intensively, you'd better declare a boolean
field telling if geodata are set or not and set a value to each doc,
because the -field_name:[* TO *] is an expansive query, especially on large
data sets.

Regards,

--
Tanguy

2012/10/19 darul daru...@gmail.com

 sorry, I mean this field called geodata in my schema

 fieldType name=location class=solr.LatLonType
 subFieldSuffix=_coordinate/
 field name=geodata type=location indexed=true stored=true/



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Easy-question-docs-with-empty-geodata-field-tp4014751p4014752.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Getting count for Multi-Select Faceting

2012-10-19 Thread fbrisbart
Did you look think of using 'facet.query' ?
Adding 'facet.query=category:Article' to your url should return what
you expected.

Franck Brisbart



Le vendredi 19 octobre 2012 à 15:18 +0200, Stephane Gamard a écrit :
 Hi all, 
 
 Congrats on the 4.0.0 delivery, it's a pleasure to work with! 
 
 I have a small problem that I am trying to elegantly resolve: while using 
 multi-select faceting it might happen that a facet is selected which is not 
 part of the facet list (due to limit for example). When executing the query I 
 cannot then get the facet's value count as it still outside of the scope of 
 the limit. 
 
 for a sample query: 
 http://192.168.160.2:8983/solr/select?fq={!tag=scat}category:Articlefacet.field={!ex=scat}categoryq=*:*facet=truefacet.limit=5facet.mincount=1
 
 I have the following results:
 
 lst name=facet_fields
 lst name=category
 int name=Organic Papers6225/int
 int name=Metal-Organic Papers3055/int
 int name=Research Papers236/int
 int name=Inorganic Papers187/int
 int name=Addenda and Errata59/int
 /lst
 /lst
 
 Note that the facet (category:Article) is not present within the facet_fields 
 result. I've thought of running 2 facet queries where one is not tagged and 
 merge the 2 list within the UI. Is that the best solution available, or 
 should the facet of fq be present (as sticky) within the facet_list? 
 
 Cheers, 
 
 _Stephane




Benchmarking/Performance Testing question

2012-10-19 Thread Amit Nithian
Hi all,

I know there have been many posts about this already and I have done
my best to read through them but one lingering question remains. When
doing performance testing on a Solr instance (under normal production
like circumstances, not the ones where commits are happening more
frequently than necessary), is there any value in performance testing
against a server with caches *disabled* with a profiler hooked up to
see where queries in the absence of a cache are spending the most
time?

The reason I am asking this is to tune things like field types, using
tint vs regular int, different precision steps etc. Or maybe sorting
is taking a long time and the profiler shows an inordinate amount of
time spent there etc. so either we find a different way to solve that
particular problem. Perhaps we are faceting on something bad etc. Then
we can optimize those to at least not be as slow and then ensure that
caching is tuned properly so that cache misses don't yield these
expensive spikes.

I'm trying to devise a proper performance testing for any new
features/config changes and wanted to get some feedback on whether or
not this approach makes sense. Of course performance testing against a
typical production setup *with* caching will also be done to make sure
things behave as expected.

Thanks!
Amit


Solr-4.0.0 DIH not indexing xml attributes

2012-10-19 Thread Billy Newman
Hello all,

I am having problems indexing xml attributes using the DIH.

I have the following xml:

root
Stuff attr1=some attr attr2=another attr
...
/Stuff
/root

I am using the following XPath for my fields:
field column=attr1 xpath=/root/Stuff/@attr1 /
field column=attr2 xpath=/root/Stuff/@attr2 /


However nothing is getting inserted into my index.

I am pretty sure this should work so I have no idea what is wrong.

Can anyone else confirm that this is a problem?  Or is it just me?

Thanks,
Billy


Re: Easy question ? docs with empty geodata field

2012-10-19 Thread Amit Nithian
What about querying on the dynamic lat/long field to see if there are
documents that do not have the dynamic _latlon0 or whatever defined?

On Fri, Oct 19, 2012 at 8:17 AM, darul daru...@gmail.com wrote:
 I have already tried but get a nice exception because of this field type :




 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Easy-question-docs-with-empty-geodata-field-tp4014751p4014763.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Sorl 4.0: ClassNotFoundException DataImportHandler

2012-10-19 Thread srinalluri
Thanks Chris for your reply. I really need some help here.

1) If I put the apache-solr-dataimporthandler-*.jar files in solr/lib
folder, the jar files are loading. I see that in the tomcat logs. But in the
end it says 'ClassNotFoundException DataImportHandler'.

2) So If I remove apache-solr-dataimporthandler-*.jar from solr/lib folder
and placed them in tomcat/lib folder. No more ClassNotFoundException. But
this time it says 'Error Instantiating Request Handler,
org.apache.solr.handler.dataimport.DataImportHandler failed to instantiate
org.apache.solr.request.SolrRequestHandler'.








--
View this message in context: 
http://lucene.472066.n3.nabble.com/Sorl-4-0-ClassNotFoundException-DataImportHandler-tp4014348p4014770.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Easy question ? docs with empty geodata field

2012-10-19 Thread darul
Your idea looks great but with this schema info :

 fieldType name=point class=solr.PointType dimension=2
subFieldSuffix=_d/
fieldType name=location class=solr.LatLonType
subFieldSuffix=_coordinate/
fieldtype name=geohash class=solr.GeoHashField/
.

field name=geodata type=location indexed=true stored=true/
dynamicField name=*_coordinate  type=tdouble indexed=true 
stored=false /

How can I use it ?

fq=location_coordinate:[1 to *] not working by instance





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Easy-question-docs-with-empty-geodata-field-tp4014751p4014779.html
Sent from the Solr - User mailing list archive at Nabble.com.


Highlighter isn't highlighting what is matched in query analyzer

2012-10-19 Thread Ali Nabavi
Hi, all.

The content I'm trying to index contains dollar signs that should be
indexed and matched, e.g., $1.

I've set up my schema to index the dollar sign, and am able to successfully
match it with the query analyzer; searching for $1 matches $1.

However, the highlighter doesn't seem to recognize the dollar sign.  When I
submit a query for $1, the results do contain highlighted results, but
the highlights appear like $em1/em; the dollar sign is not
highlighted.

How can I ensure that the highlighter will highlight the entirety of what
is matched in the query analyzer tool?

-Ali


[/solr] memory leak prevent tomcat shutdown

2012-10-19 Thread Jie Sun
very often when we try to shutdown tomcat, we got following error in
catalina.out indicating a solr thread can not be stopped, the tomcat results
hanging, we have to kill -9, which we think lead to some core corruptions in
our production environment. please help ...

catalina.out:

... ...

Oct 19, 2012 10:17:22 AM org.apache.catalina.loader.WebappClassLoader
clearReferencesThreads
SEVERE: The web application [/solr] appears to have started a thread named
[pool-69-thread-1] but has failed to stop it. This is very likely to create
a memory leak.

Then I used kill -3 to signal the thread dump, here is what I get (note the
thread [pool-69-thread-1] is hanging) :

2012-10-19 10:18:39
Full thread dump Java HotSpot(TM) 64-Bit Server VM (20.2-b06 mixed mode):

DestroyJavaVM prio=10 tid=0x55b39800 nid=0x7e82 waiting on
condition [0x]
   java.lang.Thread.State: RUNNABLE

pool-69-thread-1 prio=10 tid=0x2aaabcb41800 nid=0x19fa waiting on
condition [0x4205e000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  0x0006de699d80 (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(Unknown Source)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(Unknown
Source)
at java.util.concurrent.LinkedBlockingQueue.take(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.getTask(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)

JDWP Transport Listener: dt_socket daemon prio=10 tid=0x578aa000
nid=0x19f9 runnable [0x]
   java.lang.Thread.State: RUNNABLE

... ...



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-memory-leak-prevent-tomcat-shutdown-tp4014788.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: [/solr] memory leak prevent tomcat shutdown

2012-10-19 Thread Jie Sun
by the way, I am running tomcat 6, solr 3.5 on redhat 2.6.18-274.el5 #1 SMP
Fri Jul 8 17:36:59 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-memory-leak-prevent-tomcat-shutdown-tp4014788p4014792.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Easy question ? docs with empty geodata field

2012-10-19 Thread Amit Nithian
So here is my spec for lat/long (similar to yours except I explicitly
define the sub-field names for clarity)
fieldType name=latLon class=solr.LatLonType subFieldSuffix=_latLon/
field name=location type=latLon indexed=true stored=true/
!-- Could use dynamic fields here but prefer explicitly defining them
so it's clear what's going on. The LatLonType looks to be a wrapper
around these fields? --
field name=location_0_latLon type=tdouble indexed=true stored=true/
field name=location_1_latLon type=tdouble indexed=true stored=true/

So then the query would be location_0_latLon:[ * TO *].

Looking at your schema, my guess would be:
location_0_coordinate:[* TO *]
location_1_coordinate:[* TO *]

Let me know if that helps
Amit

On Fri, Oct 19, 2012 at 9:37 AM, darul daru...@gmail.com wrote:
 Your idea looks great but with this schema info :

  fieldType name=point class=solr.PointType dimension=2
 subFieldSuffix=_d/
 fieldType name=location class=solr.LatLonType
 subFieldSuffix=_coordinate/
 fieldtype name=geohash class=solr.GeoHashField/
 .

 field name=geodata type=location indexed=true stored=true/
 dynamicField name=*_coordinate  type=tdouble indexed=true
 stored=false /

 How can I use it ?

 fq=location_coordinate:[1 to *] not working by instance





 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Easy-question-docs-with-empty-geodata-field-tp4014751p4014779.html
 Sent from the Solr - User mailing list archive at Nabble.com.


number and minus operator

2012-10-19 Thread calmsoul
I have a document with name ABC 102030 XYZ and if i search for this document
with ABC and -10 then i dont get this document (which is correct behavior)
but when i do ABC and -10 i don't get the correct result back.  Any
explanation around this. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/number-and-minus-operator-tp4014794.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 4.0.0 - index version and generation not changed after delete by query on master

2012-10-19 Thread Bill Au
It's not the browser cache.  I have tried reloading the admin page and
accessing the admin page from another machine.  Both show the older index
version and generation.  On the slave, replication did kicked in and show
the new index version and generation for the slave.  But the slave admin
page also shows the older index version and generation for the master.

If I do a second delete by query on the master, the master index generation
reported the admin UI does go up by one on both the master and slave.  But
it is still one generation behind.

Bill

On Fri, Oct 19, 2012 at 7:09 AM, Erick Erickson erickerick...@gmail.comwrote:

 I wonder if you're getting hit by the browser caching the admin page and
 serving up the old version? What happens if you try from a different
 browser or purge the browser cache?

 Of course you have to refresh the master admin page, there's no
 automatic update but I assume you did that.

 Best
 Erick

 On Thu, Oct 18, 2012 at 1:59 PM, Bill Au bill.w...@gmail.com wrote:
  Just discovered that the replication admin REST API reports the correct
  index version and generation:
 
  http://master_host:port/solr/replication?command=indexversion
 
  So is this a bug in the admin UI?
 
  Bill
 
  On Thu, Oct 18, 2012 at 11:34 AM, Bill Au bill.w...@gmail.com wrote:
 
  I just upgraded to Solr 4.0.0.  I noticed that after a delete by query,
  the index version, generation, and size remain unchanged on the master
 even
  though the documents have been deleted (num docs changed and those
 deleted
  documents no longer show up in query responses).  But on the slave both
 the
  index version, generation, and size are updated.  So I though the master
  and slave were out of sync but in reality that is not true.
 
  What's going on here?
 
  Bill
 



Re: Solr 4.0 copyField not applying index analyzers

2012-10-19 Thread Jack Krupansky
What exactly is the precise symptom - give us an example with field names of 
source and dest and what precise value is in fact being indexed. Is the 
entire field value being indexed as a single term/string (if analyzer is not 
being applied)? Or, what?


-- Jack Krupansky

-Original Message- 
From: davers

Sent: Friday, October 19, 2012 2:51 PM
To: solr-user@lucene.apache.org
Subject: Solr 4.0 copyField not applying index analyzers

I am upgrading from solr 3.6 to solr 4.0 and my copyFields do not seem to be
applying the index analyzers. I'm sure there is something i'm missing in my
schema.xml. I am also using a DIH but I'm not sure that matters.

?xml version=1.0 encoding=UTF-8 ?




schema name=example version=1.5


fields




  field name=id type=string indexed=true stored=true/
  field name=groupid type=string indexed=true stored=false/
  field name=siteid type=int indexed=true stored=false
multiValued=true/
  field name=sku type=textTight indexed=true stored=true
multiValued=true/
  field name=upc type=textTight indexed=true stored=true
multiValued=true/
  field name=productID type=textTight indexed=true stored=true/
  field name=manufacturer type=text indexed=true stored=true /
  field name=productTitle type=text indexed=true stored=true/
  field name=categoryId type=int indexed=true stored=false
multiValued=true/
  field name=categoryName type=text indexed=true stored=false
multiValued=true/
  field name=theme type=text indexed=true stored=false/
  field name=description type=text indexed=false stored=false/
  field name=weight type=tfloat indexed=true stored=false/
  field name=price  type=tfloat indexed=true stored=false/
  field name=popularity type=tint indexed=true stored=false
default=0/
  field name=inStock type=boolean indexed=true stored=false
multiValued=true/
  field name=onSale type=boolean indexed=true stored=false/
  field name=hasDigiCast type=boolean indexed=true stored=false/
  field name=hasDigiVista type=boolean indexed=true stored=false/
  field name=isNew type=boolean indexed=true stored=false/
  field name=isTopSeller type=boolean indexed=true stored=false/
  field name=finish type=text indexed=true stored=true
multiValued=true/
  field name=masterFinish type=text indexed=true stored=false
multiValued=true/
  field name=series type=text indexed=true stored=false/
  field name=searchKeyword type=text_ws indexed=true stored=false
multiValued=true/
  field name=discontinued type=boolean indexed=true stored=false
/
  field name=spell type=textSpell indexed=true stored=true
multiValued=true/
  field name=_version_ type=long indexed=true stored=true/
  field name=imageURL type=string indexed=false stored=true /
  field name=productURL type=string indexed=false stored=true /


  field name=productID_sort type=string indexed=true stored=true
multiValued=false/


  field name=text type=text indexed=true stored=true
multiValued=true/


  field name=modifiedDate type=date indexed=true stored=true
multiValued=false default=NOW/
  field name=productAddDate type=tdate indexed=true stored=true
multiValued=false default=NOW/


  field name=textnge type=autocomplete_edge indexed=true
stored=true multiValued=true /

  field name=textng type=autocomplete_ngram indexed=true
stored=true multiValued=true omitNorms=true
omitTermFreqAndPositions=true /

  field name=textphon type=text_phonetic_do indexed=true
stored=true multiValued=true omitNorms=true
omitTermFreqAndPositions=true /

  dynamicField name=*_i  type=intindexed=true  stored=false
multiValued=true/
  dynamicField name=*_s type=string  indexed=true  stored=false
multiValued=true/
  dynamicField name=*_l type=long   indexed=true  stored=false
multiValued=true/
  dynamicField name=*_t type=text   indexed=true  stored=false
multiValued=true/
  dynamicField name=*_b type=boolean indexed=true stored=false
multiValued=true/
  dynamicField name=*_f type=float  indexed=true  stored=false
multiValued=true/
  dynamicField name=*_d type=double indexed=true  stored=false
multiValued=true/


  dynamicField name=*_coordinate  type=tdouble indexed=true
stored=false /

  dynamicField name=*_dt  type=dateindexed=true  stored=true/
  dynamicField name=*_dts type=dateindexed=true  stored=true
multiValued=true/


  dynamicField name=*_ti type=tintindexed=true  stored=true/
  dynamicField name=*_tl type=tlong   indexed=true  stored=true/
  dynamicField name=*_tf type=tfloat  indexed=true  stored=true/
  dynamicField name=*_td type=tdouble indexed=true  stored=true/
  dynamicField name=*_tdt type=tdate  indexed=true  stored=true/

  dynamicField name=*_pi  type=pintindexed=true  stored=true/

  dynamicField name=attr_* type=text indexed=true stored=true
multiValued=true/

  dynamicField name=random_* type=random /




/fields



uniqueKeyid/uniqueKey






 copyField source=productTitle dest=text/
 copyField source=manufacturer dest=text/
 copyField source=description dest=text/
 copyField source=productID dest=text/
 copyField 

Re: need help with exact match search

2012-10-19 Thread Jack Krupansky
Because you used solr.StandardTokenizerFactory which will tokenize terms at 
some delimiters - such as the hyphens that surround your errant 404 case.


Try solr.WhitespaceTokenizerFactory or solr.KeywordTokenizerFactory.

And maybe rename your field type from text_general_trim to text_exact 
since general implies a general text analyzer.


Test your field type changes on the Solr Admin Analysis page.

-- Jack Krupansky

-Original Message- 
From: geeky2

Sent: Friday, October 19, 2012 5:20 PM
To: solr-user@lucene.apache.org
Subject: need help with exact match search

environment: solr 3.5

Hello,

i have a query for an exact match that is bringing back one (1) additional
record that is NOT an exact match.

when i do an exact match search for 404 - i should get back three (3)
document, *but i get back the additional record, with an
itemModelNoExactMatchStr of DUS-404-19  *

can someone help me understand what i am missing or not setting up
correctly?


response from solr with 4 documents

?xml version=1.0?
response
 lst name=responseHeader
   int name=status0/int
   int name=QTime1/int
   lst name=params
 str name=sortitemModelNoExactMatchStr asc/str
 str name=fqitemType:2/str
 str name=echoParamsall/str
 str name=qfitemModelNoExactMatchStr^30.0/str
 str name=q.alt*:*/str
 str name=rows50/str
 str name=defTypeedismax/str
 str name=debugQuerytrue/str
   *  str name=qitemModelNoExactMatchStr:404/str*
 str name=qtmodelItemNoSearch/str
 str name=rows50/str
 str name=facetfalse/str
   /lst
 /lst
 *result name=response numFound=4 start=0*
   doc
 arr name=divProductTypeDesc
   strKitchen Equipment*/str
 /arr
 str name=divProductTypeId0212020/str
 str name=id0212020,0431  ,404
/str
 str name=itemModelDescELECTRIC GENERAL SLICER WITH VACU BASE/str
* str name=itemModelNo404/str*
 str name=itemModelNoExactMatchStr404
/str
 int name=itemType2/int
 int name=partCnt13/int
 arr name=plsBrandDesc
   strGENERAL/str
 /arr
 str name=plsBrandId0431  /str
 int name=rankNo0/int
   /doc
   doc
 arr name=divProductTypeDesc
   strVacuum, Canister/str
 /arr
 str name=divProductTypeId0642000/str
 str name=id0642000,0517  ,404
/str
 str name=itemModelDescHOOVER /str
 str name=itemModelNo404/str
* str name=itemModelNoExactMatchStr404
/str*
 int name=itemType2/int
 int name=partCnt48/int
 arr name=plsBrandDesc
   strHOOVER/str
 /arr
 str name=plsBrandId0517  /str
 int name=rankNo0/int
   /doc
   doc
 arr name=divProductTypeDesc
   strPower roller/str
 /arr
 str name=divProductTypeId0733200/str
 str name=id0733200,1164  ,404
/str
 str name=itemModelDescPOWER PAINTER/str
 str name=itemModelNo404/str
* str name=itemModelNoExactMatchStr404
/str*
 int name=itemType2/int
 int name=partCnt39/int
 arr name=plsBrandDesc
   strWAGNER/str
 /arr
 str name=plsBrandId1164  /str
 int name=rankNo0/int
   /doc
   doc
 arr name=divProductTypeDesc
   strDishwasher^/str
 /arr
 str name=divProductTypeId013/str
 str name=id013,0164  ,DUS-404-19
/str
 str name=itemModelDescDISHWASHERS/str
 str name=itemModelNoDUS-404-19 /str
 *str name=itemModelNoExactMatchStrDUS-404-19
/str*
 int name=itemType2/int
 int name=partCnt185/int
 arr name=plsBrandDesc
   strCALORIC/str
 /arr
 str name=plsBrandId0164  /str
 int name=rankNo0/int
   /doc
 /result
 lst name=debug
   str name=rawquerystringitemModelNoExactMatchStr:404/str
   str name=querystringitemModelNoExactMatchStr:404/str
   str name=parsedquery+itemModelNoExactMatchStr:404/str
   str name=parsedquery_toString+itemModelNoExactMatchStr:404/str
   lst name=explain
 str name=0212020,0431  ,404
10.053003 = (MATCH) fieldWeight(itemModelNoExactMatchStr:404 in 4745495),
product of:
 1.0 = tf(termFreq(itemModelNoExactMatchStr:404)=1)
 10.053003 = idf(docFreq=971, maxDocs=8304922)
 1.0 = fieldNorm(field=itemModelNoExactMatchStr, doc=4745495)
/str
 str name=0642000,0517  ,404
10.053003 = (MATCH) fieldWeight(itemModelNoExactMatchStr:404 in 4781972),
product of:
 1.0 = tf(termFreq(itemModelNoExactMatchStr:404)=1)
 10.053003 = idf(docFreq=971, maxDocs=8304922)
 1.0 = fieldNorm(field=itemModelNoExactMatchStr, doc=4781972)
/str
 str name=0733200,1164  ,404
10.053003 = (MATCH) fieldWeight(itemModelNoExactMatchStr:404 in 8186768),
product of:
 1.0 = tf(termFreq(itemModelNoExactMatchStr:404)=1)
 10.053003 = idf(docFreq=971, maxDocs=8304922)
 1.0 = fieldNorm(field=itemModelNoExactMatchStr, doc=8186768)
/str
 str name=013,0164  ,DUS-404-19 
5.0265017 = (MATCH) 

Re: need help with exact match search

2012-10-19 Thread geeky2
hello jack,

thank you very much for the reply - i will re-test and let you know.

really appreciate it ;)

thx
mark




--
View this message in context: 
http://lucene.472066.n3.nabble.com/need-help-with-exact-match-search-tp4014832p4014848.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Transient commit errors during autocommit

2012-10-19 Thread Casey Callendrello

Lance,
I have seen this error when the Solr process hit the maximum file 
descriptors (because the commit triggered an optimize). Make sure your 
maxfds is set as high as possible. In my case, 1024 was not nearly 
sufficient.


--Casey


On 10/19/12 6:20 PM, Lance Norskog wrote:

When a transient error happens during an autocommit, the error does not cause a 
safe rollback or notify the user there was a problem. Instead, there is a write 
lock failure and Solr has to be restarted. It run fine after restart.

Is this a known problem? Is it fixable? Is it unit-test-able?









Re: Solr-4.0.0 DIH not indexing xml attributes

2012-10-19 Thread Lance Norskog
Do other fields get added?
Do these fields have type problems? I.e. is 'attr1' a number and you are adding 
a string?
There is a logging EP that I think shows the data found- I don't know how to 
use it.
Is it possible to post the whole DIH script?

- Original Message -
| From: Billy Newman newman...@gmail.com
| To: solr-user@lucene.apache.org
| Sent: Friday, October 19, 2012 9:06:08 AM
| Subject: Solr-4.0.0 DIH not indexing xml attributes
| 
| Hello all,
| 
| I am having problems indexing xml attributes using the DIH.
| 
| I have the following xml:
| 
| root
| Stuff attr1=some attr attr2=another attr
| ...
| /Stuff
| /root
| 
| I am using the following XPath for my fields:
| field column=attr1 xpath=/root/Stuff/@attr1 /
| field column=attr2 xpath=/root/Stuff/@attr2 /
| 
| 
| However nothing is getting inserted into my index.
| 
| I am pretty sure this should work so I have no idea what is wrong.
| 
| Can anyone else confirm that this is a problem?  Or is it just me?
| 
| Thanks,
| Billy
| 


Re: Benchmarking/Performance Testing question

2012-10-19 Thread Otis Gospodnetic
Hi Amit,

I'm not sure I follow what you are after...
Yes, seeing how queries that result in cache misses perform is
valuable (esp. if you have low cache hit rate in production)
But figuring out if you chose a bad field type or bad faceting method
or  doesn't require profiling - you can review configs and logs
and such and quickly find performance issues.

In production (or dev, really, too) you can use tools like SPM for
Solr or NewRelic.  SPM will show you performance breakdown over all
Solr SearchComponents used in searches.  NewRelic has non-free plans
that also let you do on-demand profiling, so you could profile Solr in
production, which can be handy.

HTH,
Otis
--
Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html


On Fri, Oct 19, 2012 at 12:02 PM, Amit Nithian anith...@gmail.com wrote:
 Hi all,

 I know there have been many posts about this already and I have done
 my best to read through them but one lingering question remains. When
 doing performance testing on a Solr instance (under normal production
 like circumstances, not the ones where commits are happening more
 frequently than necessary), is there any value in performance testing
 against a server with caches *disabled* with a profiler hooked up to
 see where queries in the absence of a cache are spending the most
 time?

 The reason I am asking this is to tune things like field types, using
 tint vs regular int, different precision steps etc. Or maybe sorting
 is taking a long time and the profiler shows an inordinate amount of
 time spent there etc. so either we find a different way to solve that
 particular problem. Perhaps we are faceting on something bad etc. Then
 we can optimize those to at least not be as slow and then ensure that
 caching is tuned properly so that cache misses don't yield these
 expensive spikes.

 I'm trying to devise a proper performance testing for any new
 features/config changes and wanted to get some feedback on whether or
 not this approach makes sense. Of course performance testing against a
 typical production setup *with* caching will also be done to make sure
 things behave as expected.

 Thanks!
 Amit


Re: diversity of search results?

2012-10-19 Thread Otis Gospodnetic
Hi Paul,

We've done this for a client in the past via a custom SearchComponent
and it worked well.  Yes, it involved some post-processing, but on the
server, not client.  I *think* we saw 10% performance degradation.

Otis
--
Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html


On Fri, Oct 19, 2012 at 3:26 AM, Paul Libbrecht p...@hoplahup.net wrote:
 Hello SOLR expert,

 yesterday in our group we realized that a danger we may need to face is that 
 a search result includes very similar results.
 Of course, one would expect skimming so that duplicates that show almost the 
 same results in a search result would be avoided but we fear that this is not 
 possible.

 I was wondering if some technology, plugin, or even research was existing 
 that would enable a search result to be partially reordered so that 
 diversity is ensured for a first page of results at least.

 I suppose that might be doable by processing the result page and the next 
 (and the five next?) and pushing down some results if they are too similar 
 to previous ones.

 Hope I am being clear.

 Paul


Re: DIH throws NullPointerException when using dataimporter.functions.escapeSql with parent entities

2012-10-19 Thread Lance Norskog
If it worked before and does not work now, I don't think you are doing anything 
wrong :)

Do you have a different version of your JDBC driver?
Can you make a unit test with a minimal DIH script and schema?
Or, scan through all of the JIRA issues against the DIH from your old Solr 
capture date.


- Original Message -
| From: Dominik Siebel m...@dsiebel.de
| To: solr-user@lucene.apache.org
| Sent: Thursday, October 18, 2012 11:22:54 PM
| Subject: Fwd: DIH throws NullPointerException when using 
dataimporter.functions.escapeSql with parent entities
| 
| Hi folks,
| 
| I am currently migrating our Solr servers from a 4.0.0 nightly build
| (aprox. November 2011, which worked very well) to the newly released
| 4.0.0 and am running into some issues concerning the existing
| DataImportHandler configuratiions. Maybe you have an idea where I am
| going wrong here.
| 
| The following lines are a highly simplified excerpt from one of the
| problematic imports:
| 
| entity name=path rootEntity=false query=SELECT p.id, IF(p.name
| IS NULL, '', p.name) AS name FROM path p GROUP BY p.id
| 
| entity name=item rootEntity=true query=
| SELECT
| i.*,
| 
| CONVERT('${dataimporter.functions.escapeSql(path.name)}' USING
| utf8) AS path_name
| FROM items i
| WHERE i.path_id = ${path.id} /
| 
| /entity
| 
| While this configuration worked without any problem for over half a
| year now, when upgrading to 4.0.0-BETA AND 4.0.0 the Import throws
| the
| followeing Stacktrace and exits:
| 
|  SEVERE: Exception while processing: path document :
| null:org.apache.solr.handler.dataimport.DataImportHandlerException:
| java.lang.NullPointerException
| 
| which is caused by
| 
| Caused by: java.lang.NullPointerException
| at
| 
org.apache.solr.handler.dataimport.EvaluatorBag$1.evaluate(EvaluatorBag.java:79)
| 
| In other words: The EvaluatorBag doesn't seem to resolve the given
| path.name variable properly and returns null.
| 
| Does anyone have any idea?
| Appreciate your input!
| 
| Regards
| Dom
|