Re: Solr HTTP Replication Question

2013-01-24 Thread Amit Nithian
Okay so after some debugging I found the problem. While the replication
piece will download the index from the master server and move the files to
the index directory but during the commit phase, these "older" generation
files are deleted and the index is essentially left in tact.

I noticed that a full copy is needed if the index is "stale" (meaning that
files in common between the master and slave have different sizes) but also
I think a full copy should be needed if the slaves generation is higher
than the master as well. In short, to me it's not sufficient enough to
simply say a full copy is needed if the slave's index version is >=
master's index version. I'll create a patch and file a bug along with a
more thorough writeup of how I got in this state.

Thanks!
Amit



On Thu, Jan 24, 2013 at 2:33 PM, Amit Nithian  wrote:

> Does Solr's replication look at the generation difference between master
> and slave when determining whether or not to replicate?
>
> To be more clear:
> What happens if a slave's generation is higher than the master yet the
> slave's index version is less than the master's index version?
>
> I looked at the source and didn't seem to see any reason why the
> generation matters other than fetching the file list from the master for a
> given generation. It's too wordy to explain how this happened so I'll go
> into details on that if anyone cares.
>
> Thanks!
> Amit
>


RE: SOLR 4 getting stuck during restart

2013-01-24 Thread vijeshnair
Thanks James for the heads up and apologies for a delayed response.Here's the
full details about this issue. Mine is an e-com app so the index contains
the product catalog comprising roughly 13million products. At this point I
thought of using the index based dictionary as the bet option for the "Did
you Mean" functionality. I am not sure if every one facing this issue, but
here is what I am observing as far as dictionary is concerned. 

Index based dictionary

- I was building the dictionary using the following url, once I completed
the full indexing. For the time being I have kept the buildOnCommit and
buildOnOptimize options intentionally to false, as I didn't want it to slow
down the full indexing.

http://localhost:8090/solr/select?rows=0&spellcheck=true&spellcheck.build=true&spellcheck.dictionary=jarowinkler
 

- Once I created the dictionary when I tried to re-start my tomcat, I am
facing the issue which I have stated before (I was waiting for around 20mts,
the restart didn't happen).
- When I removed the dictionary from the "data" folder, the server restart
started working. 
- I have tried the spellcheck.collation=false as you suggested, but it
didn't help.

Direct Spell Checker

I have experimented with the new "DirectSolrSpellChecker", where it does not
create a separate dictionary folder, rather build the spellchecker in the
main index itself. The results were exactly same as before, I was getting
stuck during the restarts. I think the traditional spellchecker would be
better in this case, as you can remove, restart and move back the dictionary
as and when required. Where in case of DirectSolrSpellChecker, it doesn't
create a separate dictionary folder, so not sure what to remove from the
index, so that server can restart.

James, I will request you to validate this, and it will be really great help
if you can point out if I am doing any mistakes here. If you think what I am
doing make sens, I will go ahead and log this bug in JIIRA.

Thanks
Vijesh K Nair



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-4-getting-stuck-during-restart-tp4034734p4036163.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Get tokenized words in Solr Response

2013-01-24 Thread Romita Saha
Hi Mikhail,

Thanks for your guidance. I found the required information in 
debugQuery=on.

Thanks and regards,
Romita 


From:   Mikhail Khludnev 
To: solr-user , 
Date:   01/24/2013 03:19 PM
Subject:Re: Get tokenized words in Solr Response



Romita,

IIRC you've already asked this, and I replied that everything what you 
need
is on debugQuery=on output. That format is a little bit verbose, and I
suppose you can experience some difficulties on finding the necessary info
there. Please provide debugQuery=on output, I can try to highlight the
necessary info for you.


On Thu, Jan 24, 2013 at 6:11 AM, Romita Saha
wrote:

> Hi,
>
> I want the tokenized keywords to be displayed in solr response. As for
> example, my solr search could be "Seach this document named XYZ-123". 
And
> the tokenizer in schema.xml tokenizes the query as follows:
> "search documnent xyz 123". I want to get these tokenized words in the
> Solr response. Is it possible?
>
> Thanks and regards,
> Romita




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics


 



Re: JSON query syntax

2013-01-24 Thread Yonik Seeley
On Thu, Jan 24, 2013 at 8:55 PM, Otis Gospodnetic
 wrote:
> Yes, this is JSON, so right
> there it may be better, but for instance I see "v" here which to a regular
> human may not be as nice as "value" if that is what "v" stands for.

One goal was to reuse the parsers/parameter names.  A completely
disjoint set would certainly lead to confusion.
Concise *common* abbreviations are fine I think - for example we
quickly get used to (and prefer) f(x) over function(variable1)

We could add some aliases though.

-Yonik
http://lucidworks.com


Re: JSON query syntax

2013-01-24 Thread Otis Gospodnetic
Nice, Yonik!
Here is one suggestion. OK, I'm beginning you - please don't make
it be as hard on the eyes as Local Params. :)  I thought it was just me who
could never get along with Local Params, but I've learned that a number of
people find Local Params very hard to grok.  Yes, this is JSON, so right
there it may be better, but for instance I see "v" here which to a regular
human may not be as nice as "value" if that is what "v" stands for.
Looking at examples from the JIRA issue

{'frange':{'v':'mul(foo_i,2)', 'l':20,'u':24}}}


v is value?

mul is multiply?

what's "l"? left? No, low(er)?

what's "u"? Aha, upper?


I'd rather use a few extra character and be clear, easily memorable, and
user friendly.  People love ES's JSON API and I have never ever heard
anyone say it's too verbose.

Thanks,
Otis





On Thu, Jan 24, 2013 at 8:44 PM, Yonik Seeley  wrote:

> Although "lucene" syntax tends to be quite concise, nice looking, and
> easy to build by hand (the web browser is a major debugging tool for
> me), some people prefer to use a more "structured" query language
> that's easier to build up programmatically.  XML fits the bill, but
> people tend to prefer JSON these days.
>
> Hence my first quick prototype:
> https://issues.apache.org/jira/browse/SOLR-4351
>
> I'm pretty happy so far with how easily it's fit in with our QParser
> framework, which should generally allow parsers to not care about the
> underlying syntax of queries they need to deal with.
> For example: the "join" qparser uses the query specified by "v", but
> doesn't care of it's in lucene syntax, or if it was part of the JSON.
>
> {'join':{'from':'qqq_s', 'to':'www_s', 'v':'id:10'}}
> {'join':{'from':'qqq_s', 'to':'www_s', 'v':{'term':{'id':'10'
>
> Note: replace the single quotes with double quotes before trying it
> out - these are just test strings that have the replacement done in
> the test code so that they are easier to read.
>
> There's a fair bit left to do of course... like how to deal with
> "boost", "cache", "cost", parameter dereferencing, etc.
> Feedback welcome... and hopefully this will be good to go for 4.2
>
> -Yonik
> http://lucidworks.com
>


JSON query syntax

2013-01-24 Thread Yonik Seeley
Although "lucene" syntax tends to be quite concise, nice looking, and
easy to build by hand (the web browser is a major debugging tool for
me), some people prefer to use a more "structured" query language
that's easier to build up programmatically.  XML fits the bill, but
people tend to prefer JSON these days.

Hence my first quick prototype: https://issues.apache.org/jira/browse/SOLR-4351

I'm pretty happy so far with how easily it's fit in with our QParser
framework, which should generally allow parsers to not care about the
underlying syntax of queries they need to deal with.
For example: the "join" qparser uses the query specified by "v", but
doesn't care of it's in lucene syntax, or if it was part of the JSON.

{'join':{'from':'qqq_s', 'to':'www_s', 'v':'id:10'}}
{'join':{'from':'qqq_s', 'to':'www_s', 'v':{'term':{'id':'10'

Note: replace the single quotes with double quotes before trying it
out - these are just test strings that have the replacement done in
the test code so that they are easier to read.

There's a fair bit left to do of course... like how to deal with
"boost", "cache", "cost", parameter dereferencing, etc.
Feedback welcome... and hopefully this will be good to go for 4.2

-Yonik
http://lucidworks.com


RE: solr parsed query dropping special chars

2013-01-24 Thread Tegelberg, Allan
Thanks for the education Chris,  
I pasted the chars into  Index and Query fields on analyzer panel.

Index/Query Analyzers almost the same.. 
On both, non-greeks drop out after worddelimiterfilter
Index analyzer has grey background of words that seem to make it thru all the 
filters.

WhitespaceTokenizerFactory <-  ∠ ψ Σ • ≤ ≠ • ≥ μ ω φ θ ¢ β √ Ω ° ± Δ #  
SynonymFilterFactory (query only) <- ditto
StopFilterFactory<- ditto
WordDelimiterFilterFactory  <- ψ Σ μ ω φ θ β Ω Δ  now only greeks
LowerCaseFilterFactory  <- ψ σ μ ω φ θ β ω δ  lower case Greeks only
SnowballPorterFilterFactory <- ψ σ μ ω φ θ β ω δ

so I'm thinking I need to change the worddelimiterfilter properties  
{catenateWords=0, catenateNumbers=0, splitOnCaseChange=1, catenateAll=0, 
generateNumberParts=1, generateWordParts=1, splitOnNumerics=0}

or copy these strings into a different field name/type without word delimiter, 
that way I wouldn't affect any ways that existing text is being searched. 
Sound right?

Allan Tegelberg





-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
Sent: Thursday, January 24, 2013 3:46 PM
To: solr-user@lucene.apache.org
Subject: Re: solr parsed query dropping special chars

: When I search for these characters in the admin query, I can only find the 
Greeks.
: debug shows the parsed query only has greek chars like omega, delta, sigma
: but does not contain others like degree, angle, cent, bullet, less_equal…

this is most likeley because of the analyzer you are using for your text field, 
an assumption which can be verified using the Analysis tool in the admin UI to 
see how the various pieces of your query analzer deal with the input.

My guess is you are using a tokenizer which ignores punctuation.

Don't foget to check your index analyzer as well -- you may not even be 
indexing these punctuation symbols either...

: the response dumps the document and  shows me the chars exist in the 
document..
: angle (∠)

...that's the stored value, the *indexed* text may not contain those terms.


-Hoss


Re: Search strategy - improving search quality for short search terms such as "doll"

2013-01-24 Thread Chris Hostetter

: My next target is searches on simple terms such as "doll" which, in google,
: would return documents about, well, "toy dolls", because that's the most
: common usage of the simple term "doll". But in my index it predominantly
: returns documents about CDs with the song "Doll Face", and "My baby doll" in
: them.

if you have good metdata about your documents, then you might get 
satisfing results using something like the edismax parser with appropriate 
weights on various fields -- you could for example say that "matching 
on the product_title field is important, but matching on a category_name 
is much more important" and thus use something like...

q=doll&qf=product_title^5+category_name^50

..but that only helps you if you have category_name values that match the 
words people are searching for like "Doll"

This type of appoach doesn't help you in the case where you might have the 
inverse problem: document (category_name="doll", product_name="My baby") 
showing up first when a user searches for "my baby doll" but the user is 
really trying to find the document (category_name=cd, product_name="my 
baby doll")

it really all depends on your user base and the type of queries you 
expect.

An interesting solution to this problem that i've seen is to pre-process 
the query using a baysiean classifier to suggest which categories to boost 
on.

Here's a blog on this where the classifier was trained based on the 
keywords & categories of the documents...

http://engineering.wayfair.com/better-lucenesolr-searches-with-a-boost-from-an-external-naive-bayes-classifier/

...but you could also train the classifier using query logs and data about 
what documents users ultimately clicked on (to help you learn that for 
your userbase, people who search for "baby" are typically looking for CDs 
not dolls -- or vice versa)


: 
:  
: 
: I'm not directly asking how to solve this as much as I'm asking what
: direction I should be looking in to learn what I need to know to tackle the
: general issue myself.
: 
:  
: 
: Left on my own I would start looking at categorizing the CD's into a facet
: called "music", reasonably doable in my dataset. Then I need to reduce the
: boost-value of the entire facet/category of music unless certain pre-defined
: query terms exist, such as [music, cd, song, listen, dvd, , etc.]. 
: 
:  
: 
: I don't yet know how to do all of this, but after a couple more good books I
: should be "dangerous".
: 
:  
: 
: So the question to this list:
: 
:  
: 
: -  Am I on the right track here?  If not, can you point me in a
: direction to go?
: 
:  
: 
:  
: 
: 

-Hoss


Re: Solr load balancer

2013-01-24 Thread Chris Hostetter

: For example perhaps a load balancer that sends multiple queries 
: concurrently to all/some replicas and only keeps the first response 
: might be effective. Or maybe a load balancer which takes account of the 

I know of other distributed query systems that use this approach, when 
query speed is more important to people then load and people who use them 
seem to think it works well.

given that it synthetically multiplies the load of each end user request, 
it's probably not something we'd want to turn on by default, but a 
configurable option certainly seems like it might be handy.


-Hoss


Re: solr parsed query dropping special chars

2013-01-24 Thread Chris Hostetter
: When I search for these characters in the admin query, I can only find the 
Greeks.
: debug shows the parsed query only has greek chars like omega, delta, sigma
: but does not contain others like degree, angle, cent, bullet, less_equal…

this is most likeley because of the analyzer you are using for your text 
field, an assumption which can be verified using the Analysis tool in the 
admin UI to see how the various pieces of your query analzer deal with the 
input.

My guess is you are using a tokenizer which ignores punctuation.

Don't foget to check your index analyzer as well -- you may not even be 
indexing these punctuation symbols either...

: the response dumps the document and  shows me the chars exist in the 
document..
: angle (∠)

...that's the stored value, the *indexed* text may not contain those 
terms.


-Hoss

Re: Submit schema definition using curl via SOLR

2013-01-24 Thread Mark Miller

On Jan 24, 2013, at 5:22 PM, Fadi Mohsen  wrote:
> 
> The reason we would like to avoid Zookeeper are
> * due to lack of knowledge.
> * the amount of work/scripting for developers per module and release
> documentation.
> * the extra steps of patching ZK nodes for QA and operations.
> 
> ZkCLI is a nice tool, but then instead of interacting with one service over
> HTTP, the application needs:
> * extra jar files

We should address this I think - it really shouldn't require anymore than the 
SolrJ jars. Currently it also requires the core jars. Still not as minimal as 
just curl posting I know.

Testing and reporting on the issue I posted, as well as discussion around 
expanding it, will likely help pushing those features forward.

- Mark



RE: Sorting on Score Problem

2013-01-24 Thread Kuai, Ben
Hi Hoss

Thanks for the reply. 

Unfortunately we have other customized similarity classes that I don’t know how 
to disable them and still make query work. 

I am trying to attach more information once I work out how to simply the issue.

Thanks
Ben

From: Chris Hostetter [hossman_luc...@fucit.org]
Sent: Thursday, January 24, 2013 12:34 PM
To: solr-user@lucene.apache.org
Subject: Re: Sorting on Score Problem

: We met a wired problem in our project when sorting by score in Solr 4.0,
: the biggest score document is not a the top the debug explanation from
: solr are like this,

that's weird ... can you post the full debugQuery output of a an example
query showing the problem, using "echoParams=all" & "fl=id,score" (or
whatever unique key field you have)

also: can you elaborate wether you are using a single node setup or a
distributed (ie: SolrCloud) query?

: Then we thought it could be a float rounding problem then we implement
: our own similarity class to increse queryNorm by 10,000 and it changes
: the score scale but the rank is still wrong.

when you post the details request above, please don't use your custom
similarity (just the out of the box solr code) so there's one less
variable in the equation.


-Hoss


AW: Does solr 4.1 support field compression?

2013-01-24 Thread André Widhani
These are the figures I got after indexing 4 and half million documents with 
both Solr 3.6.1 and 4.1.0 (and optimizing the index at the end).

  $ du -h --max-depth=1
  67G   ./solr410
  80G   ./solr361

Main contributor to the reduced space consumption is (as expected I guess) the 
.fdt file:

  $ ls -lh solr361/*/*/*.fdt
  29G solr361/core-tex68bohyrh23qs192adaq-index361/index/_bab.fdt

  $ ls -lh solr410/*/*/*.fdt
  18G solr410/core-tex68bohyz1teef3xsjdaw-index410/index/_23uy.fdt

Depends of course on your individual ratio of stored versus indexed-only fields.

André


Von: Shawn Heisey [s...@elyograg.org]
Gesendet: Donnerstag, 24. Januar 2013 16:58
An: solr-user@lucene.apache.org
Betreff: Re: Does solr 4.1 support field compression?

On 1/24/2013 8:42 AM, Ken Prows wrote:
> I didn't see any mention of field compression in the release notes for
> Solr 4.1. Did the ability to automatically compress fields end up
> getting added to this release?

The concept of compressed fields (an option in schema.xml) that existed
in the 1.x versions of Solr (based on Lucene 2.9) was removed in Lucene
3.0.  Because Lucene and Solr development were combined, the Solr
version after 1.4.1 is 3.1.0, there is no 1.5 or 2.x version of Solr.

Solr/Lucene 4.1 compresses all stored field data by default.  I don't
think there's a way to turn it off at the moment, which is causing
performance problems for a small subset of Solr users.  When it comes
out, Solr 4.2 will also have compressed term vectors.

The release note contains this text:

Stored fields are compressed. (See
http://blog.jpountz.net/post/33247161884/efficient-compressed-stored-fields-with-lucene)

It looks like the solr CHANGES.txt file fails to specifically mention
LUCENE-4226  which
implemented compressed stored fields.

Thanks,
Shawn



Re: Submit schema definition using curl via SOLR

2013-01-24 Thread Fadi Mohsen
Thanks Per, would the first approach involve restarting Solr?

Thanks Mark, that's great, Ill try checkout and apply patches from ticket
to understand further.
The reason we would like to avoid Zookeeper are
 * due to lack of knowledge.
 * the amount of work/scripting for developers per module and release
documentation.
 * the extra steps of patching ZK nodes for QA and operations.

ZkCLI is a nice tool, but then instead of interacting with one service over
HTTP, the application needs:
 * extra jar files
 * know ZK hostname/IP and port (different in each
dev/qa/systest/accept/production environment), which is per module a one to
much configuration step.


On Thu, Jan 24, 2013 at 7:18 PM, Mark Miller  wrote:

>
> On Jan 24, 2013, at 10:02 AM, Fadi Mohsen  wrote:
>
> > Hi, We would like to use Solr to index statistics from any Java module in
> > our production environment.
> >
> > Applications have to can create collections and index data on demand, so
> my
> > initial thought is to use different HTTP methods to accomplish a
> collection
> > in cluster and then right away start HTTP POST documents, but the issue
> > here is the schema.xml.
> > Is it possible to HTTP POST the schema via Solr to Zookeeper?
>
> I've done some work towards this at
> https://issues.apache.org/jira/browse/SOLR-4193
>
> >
> > Or do I have to know about other service host/IP than SOLR, such as
> > ZooKeeper (wanted to understand whether there is a way to avoid knowing
> > about zookeeper in production.)?
>
> I wouldn't try to avoid it - it's probably simpler to deal with than you
> think.
>
> It's also pretty easy to use
> http://wiki.apache.org/solr/SolrCloud#Command_Line_Util to upload a new
> schema.xml - then just Collections API reload command. Two lines in a
> script.
>
> - Mark
>
>


PK uniqueness aware Solr index merging?

2013-01-24 Thread FollowUp

FollowUp.cc Reminder



You received this email because gregg...@gmail.com set a public FollowUp.cc 
reminder
and it's the first time you've appeared on one (congrats, you have wise 
friends!).



3 Reasons why people use FollowUp.cc Reminders


- It removes the step of having to mark on your calendar when to follow up with 
someone

- Can be used with the BCC field so the recipient does not know it was set

- Forward emails you want to deal with later using a simple reminder




Currently, you will receive the reminder that was just set or this email thread:


Unsubscribe from just this reminder

> http://followup.cc/et.php?pref=thread&action=pref_set&cid=192384&msg_id=792402&date_rem_sent=1359061873&email=solr-user%40lucene.apache.org&utm_source=pref_email&utm_medium=email&utm_content=c192384&utm_term=thread


Unsubscribe from all future FollowUp reminders


> http://followup.cc/et.php?pref=all&action=pref_set&cid=192384&msg_id=792402&date_rem_sent=1359061873&email=solr-user%40lucene.apache.org&utm_source=pref_email&utm_medium=email&utm_content=c192384&utm_term=perm




Copyright 2013 FollowUp.cc | http://www.followup.cc | All rights reserved
   



indexVersion returns multiple results when called

2013-01-24 Thread davidq
Hi,

We have 5 core masters and 5 core slaves. The main core houses about 85,000
douments, so small, although the content of each document is quite large.
The second core holds the same number of docs but far less - and different -
data.

We reindex all cores every morning and the replication poll is 5 minutues.
The main core takes 15 minutes to reindex (optimize). At some point, an
incomplete index is picked up by the slave and our web site disappears until
the optimize takes place. I know we could increase the poll to 30 minutes
but that would be no guarantee.

Thought we'd sove it by writing a script to get the indexversion, kick off
reindexing and periodically check the current indexversion against the first
- if the same, sleep for 2 minutes and then check again. Once they're
different, do a fetchIndex from the slave.

Works on all the cores except the main one. We get a different indexversion
after two minutes, the slave gets populated with an almost empty index and
the site is out!

All the other cores exhibit the same indexversion. What have we
misunderstood or got wrong?

Regards,

David Q




--
View this message in context: 
http://lucene.472066.n3.nabble.com/indexVersion-returns-multiple-results-when-called-tp4036046.html
Sent from the Solr - User mailing list archive at Nabble.com.


PK uniqueness aware Solr index merging?

2013-01-24 Thread Gregg Donovan
We have a Hadoop process that produces a set of Solr indexes from a cluster
of HBase documents. After the job runs, we pull the indexes from HDFS and
merge the them together locally. The issue we're running into is that
sometimes we'll have duplicate occurrences of a primary key across indexes
that we'll want merged out. For example, a set of directories with:

./dir00/
doc_id=0
PK=1

./dir01/
doc_id=0
PK=1

should merge into a Solr index containing a single document rather than one
with two Lucene documents each containing PK=1.

The Lucene-level merge code -- i.e. oal.index.SegmentMerger.merge()--
doesn't know about the Solr schema, so it will merge these two directories
into two duplicate documents. It doesn't appear that either Solr's
oas.handler.admin.CoreAdminHandler.handleMergeAction(SolrQueryRequest,
SolrQueryResponse) handles this either, as it ends up passing the list of
merge directories to oal.index.IndexWriter.addIndexes(IndexReader...) via
oas.update.DirectUpdateHandler2.mergeIndexes(MergeIndexesCommand).

So, if I want to merge multiple Solr directories in a way that respects
primary key uniqueness, is there any more efficient manner than re-adding
all of the documents in each directory to a new Solr index to avoid PK
duplicates?

Thanks.

--Gregg

Gregg Donovan
Senior Software Engineer, Etsy.com
gr...@etsy.com


Re: Solr SQL Express Integrated Security - Unable to execute query

2013-01-24 Thread O. Olson
Michael Della Bitta-2 wrote
> On Thu, Jan 24, 2013 at 11:34 AM, O. Olson <

> olson_ord@

> > wrote:
>>
>> Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: The server
>> SQLEXPRESS is not configured to listen with TCP/IP.
> 
> 
> That's probably your problem...
> 
> 
> Michael Della Bitta
> 
> 
> Appinions
> 18 East 41st Street, 2nd Floor
> New York, NY 10017-6271
> 
> www.appinions.com
> 
> Where Influence Isn’t a Game


Good call Michael. I did have to enable TCP
(http://msdn.microsoft.com/en-us/library/hh231672.aspx  for others who have
the same problem), but I did not still not get this to work. 

I then tested my Driver, JDBC URL & SQL Query in a plain old Java class.
This showed me that it was almost impossible to get integrated
authentication to work in Java. I finally went with specifying the usename
and password literally. (I hope this useful to others):


public static void main(String[] args) throws Exception {
String url =
"jdbc:sqlserver://localhost\\SQLEXPRESS;database=Amazon;user=solrusr;password=solrusr;";
String driver = "com.microsoft.sqlserver.jdbc.SQLServerDriver";
Connection connection = null;
try {
System.out.println("Loading driver...");
Class.forName(driver);
System.out.println("Driver loaded! Attempting 
Connection ...");
connection = DriverManager.getConnection(url);
System.out.println("Connection succeeded!");
ResultSet RS = 
connection.createStatement().executeQuery("SELECT ProdID,
Descr FROM Table_Temp");
try {

while(RS.next() != false) {
System.out.println(RS.getString(1) + "  " +
RS.getString(2));
}
} finally {
RS.close();
}
// Success.
} catch (SQLException e) {} finally {
if (connection != null) try { connection.close(); } catch
(SQLException ignore) {}
}
}

Hence, I modified my db-data-config.xml to











This worked for me.

Thanks again Michael & Shawn.
O. O.










--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-SQL-Express-Integrated-Security-Unable-to-execute-query-tp4035758p4036056.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Deletion from database

2013-01-24 Thread Dyer, James
This post on stackoverflow has a good run-down on your options:
http://stackoverflow.com/questions/1555610/solr-dih-how-to-handle-deleted-documents/1557604#1557604

If you're using DIH, you can get more information from: 
http://wiki.apache.org/solr/DataImportHandler

The easiest thing, if using a delta import is to add "deletePkQuery" on your 
entity like this:


Another approach is to have a second top-level entity that uses the special 
command:


This second approach works if you use DIH but do delta updates using the 
approach described here: 
http://wiki.apache.org/solr/DataImportHandlerFaq#fullimportdelta

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: hassancrowdc [mailto:hassancrowdc...@gmail.com] 
Sent: Thursday, January 24, 2013 12:19 PM
To: solr-user@lucene.apache.org
Subject: Re: Deletion from database

ok, how can i issue delete for each item deleted since the last successful
update? Do i write something like delete query with delta import query in
dataconfig? If so, what will i add in dataconfig for deletion? 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Deletion-from-database-tp4036018p4036026.html
Sent from the Solr - User mailing list archive at Nabble.com.




Re: Deletion from database

2013-01-24 Thread hassancrowdc
ok, how can i issue delete for each item deleted since the last successful
update? Do i write something like delete query with delta import query in
dataconfig? If so, what will i add in dataconfig for deletion? 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Deletion-from-database-tp4036018p4036026.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Submit schema definition using curl via SOLR

2013-01-24 Thread Mark Miller

On Jan 24, 2013, at 10:02 AM, Fadi Mohsen  wrote:

> Hi, We would like to use Solr to index statistics from any Java module in
> our production environment.
> 
> Applications have to can create collections and index data on demand, so my
> initial thought is to use different HTTP methods to accomplish a collection
> in cluster and then right away start HTTP POST documents, but the issue
> here is the schema.xml.
> Is it possible to HTTP POST the schema via Solr to Zookeeper?

I've done some work towards this at 
https://issues.apache.org/jira/browse/SOLR-4193

> 
> Or do I have to know about other service host/IP than SOLR, such as
> ZooKeeper (wanted to understand whether there is a way to avoid knowing
> about zookeeper in production.)?

I wouldn't try to avoid it - it's probably simpler to deal with than you think.

It's also pretty easy to use 
http://wiki.apache.org/solr/SolrCloud#Command_Line_Util to upload a new 
schema.xml - then just Collections API reload command. Two lines in a script.

- Mark



Re: Solr SQL Express Integrated Security - Unable to execute query

2013-01-24 Thread Michael Della Bitta
On Thu, Jan 24, 2013 at 11:34 AM, O. Olson  wrote:
>
> Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: The server
> SQLEXPRESS is not configured to listen with TCP/IP.


That's probably your problem...


Michael Della Bitta


Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game


Re: Solr 4.1.0 shardHandlerFactory Null Pointer Exception when setting up embedded solrj solr server for unit testing

2013-01-24 Thread Mark Miller
This is my fault - I discovered this myself a few days ago. I've been meaning 
to file a jira ticket and have not gotten around to it yet.

You can also work around it like this:

CoreContainer container = new CoreContainer(loader) {
  // workaround since we don't call container#load
  {initShardHandler(null);}
};

- Mark

On Jan 24, 2013, at 9:22 AM, Ted Merchant  wrote:

> We recently updated from Solr 4.0.0 to Solr 4.1.0.  Because of the change we 
> were forced to upgrade a custom query parser.  While the code change itself 
> was minimal, we found that our unit tests stopped working because of a 
> NullPointerException on line 181 of handler.component.SearchHandler:
> ShardHandler shardHandler1 = shardHandlerFactory.getShardHandler();
> We determined that the cause of this exception was that shardHandlerFactory 
> was never initialized in the solr container.  The reason for this seems to be 
> that the shard handler is setup in core.CoreContainer::initShardHandler which 
> is called from core.CoreContainer::load. 
> When setting up the core container we were using the  public 
> CoreContainer(SolrResourceLoader loader) constructor.  This constructor never 
> calls the load method, so initShardHandler is never called and the 
> shardHandler is never initialized. 
> In Solr 4.0.0 the shardHandler was initialized on the calling of 
> getShardHandlerFactory.  This code was modified and moved by revision 
> 1422728: SOLR-4204: Make SolrCloud tests more friendly to FreeBSD blackhole 2 
> environments.
>  
> We fixed our issue by using the public CoreContainer(String dir, File 
> configFile) constructor which calls the load method.
> I just wanted to make sure that people were aware of this issue and to 
> determine if it really is an issue or if having the shardHandler be null was 
> expected behavior unless someone called the load(String dir, File configFile 
> ) method.
>  
> Thank you,
>  
> Ted
>  
>  
>  
> Stack trace of error:
> org.apache.solr.client.solrj.SolrServerException: 
> org.apache.solr.client.solrj.SolrServerException: 
> java.lang.NullPointerException
> at 
> org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:223)
> at 
> org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:90)
> at 
> org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301)
> at 
> com.cision.search.solr.ProximityQParserTest.testInit(ProximityQParserTest.java:72)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown 
> Source)
> at java.lang.reflect.Method.invoke(Unknown Source)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
> at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
> at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
> at 
> org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
> at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
> at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
> at 
> org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
> at 
> org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
> at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
> at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
> at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
> at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
> Caused by: org.apache.solr.client.solrj.SolrServerException: 
> java.lang.NullPointerException
>   

Re: zookeeper config

2013-01-24 Thread Mark Miller

On Jan 24, 2013, at 7:05 AM, Shawn Heisey  wrote:

> My experience has been that you put the chroot at the very end, not on every 
> host entry

Yup - this came up on the mailing list not too long ago and it's currently 
correctly documented on the SolrCloud wiki.

- Mark

Re: Deletion from database

2013-01-24 Thread Walter Underwood
The general solution is to add a "deleted" column to your database, or even a 
"deleted date" column.

When you update Solr from the DB, issue a delete for each item deleted since 
the last successful update.

You can delete those rows after the Solr update or to be extra safe, delete 
them a few days later.

For this to work, you must not re-use IDs.

wunder

On Jan 24, 2013, at 10:05 AM, hassancrowdc wrote:

> Hi,
> I am trying to figure out a way so that if i delete anything from my
> database how will that item be deleted from my indexed data? 
> is there anyway i can make new core with same config as the existing core,
> do full index, swap the data with the existing core and delete the new core.
> So every time i delete anything from database, it creates a new core, index
> data, swap it and then delete the new core(that was made)?
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Deletion-from-database-tp4036018.html
> Sent from the Solr - User mailing list archive at Nabble.com.







Deletion from database

2013-01-24 Thread hassancrowdc
Hi,
I am trying to figure out a way so that if i delete anything from my
database how will that item be deleted from my indexed data? 
is there anyway i can make new core with same config as the existing core,
do full index, swap the data with the existing core and delete the new core.
So every time i delete anything from database, it creates a new core, index
data, swap it and then delete the new core(that was made)?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Deletion-from-database-tp4036018.html
Sent from the Solr - User mailing list archive at Nabble.com.


Mahout - Solr vs Mahout Lucene Question

2013-01-24 Thread vybe3142
Hi,
I hate to double post but I'm not sure in which domain, the answer to my
question lies, so here's the link to my question on the mahout groups.

Basically, I'm getting different clustering results depending on whether I
index data with SOLR or Lucene. Please post any responses against the
original question.

Thanks

http://lucene.472066.n3.nabble.com/Clustering-using-Solr-Index-vs-Lucene-Index-Different-Results-td4036013.html



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Mahout-Solr-vs-Mahout-Lucene-Question-tp4036014.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr SQL Express Integrated Security - Unable to execute query

2013-01-24 Thread O. Olson
Shawn Heisey-4 wrote
>> There will be a lot more detail to this error.  This detail may have a 
>> clue about what happened.  Can you include the entire stacktrace?
>> 
>> Thanks,
>>Shawn

Thank you Shawn. The following is the entire stacktrace. I hope this helps:


INFO: Creating a connection for entity Product with URL:
jdbc:sqlserver://localhost;instanceName=SQLEXPRESS;databaseName=Amazon;integratedSecurity=true;
Jan 23, 2013 3:26:05 PM org.apache.solr.core.SolrCore execute
INFO: [db] webapp=/solr path=/dataimport params={command=status} status=0
QTime=1 
Jan 23, 2013 3:26:31 PM org.apache.solr.common.SolrException log
SEVERE: Exception while processing: Product document :
SolrInputDocument[]:org.apache.solr.handler.dataimport.DataImportHandlerException:
Unable to execute query: SELECT [ProdID],[Descr] FROM
[Amazon].[dbo].[Table_Temp] Processing Document # 1
at
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:71)
at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:252)
at
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:209)
at
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:38)
at
org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59)
at
org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:472)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:411)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:326)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:234)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:382)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:448)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:429)
Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: The server
SQLEXPRESS is not configured to listen with TCP/IP.
at
com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDriverError(SQLServerException.java:171)
at
com.microsoft.sqlserver.jdbc.SQLServerConnection.getInstancePort(SQLServerConnection.java:3188)
at
com.microsoft.sqlserver.jdbc.SQLServerConnection.primaryPermissionCheck(SQLServerConnection.java:937)
at
com.microsoft.sqlserver.jdbc.SQLServerConnection.login(SQLServerConnection.java:800)
at
com.microsoft.sqlserver.jdbc.SQLServerConnection.connect(SQLServerConnection.java:700)
at
com.microsoft.sqlserver.jdbc.SQLServerDriver.connect(SQLServerDriver.java:842)
at
org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:160)
at
org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:127)
at
org.apache.solr.handler.dataimport.JdbcDataSource.getConnection(JdbcDataSource.java:362)
at
org.apache.solr.handler.dataimport.JdbcDataSource.access$200(JdbcDataSource.java:38)
at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:239)
... 12 more

Jan 23, 2013 3:26:31 PM org.apache.solr.update.processor.LogUpdateProcessor
finish
INFO: [db] webapp=/solr path=/dataimport params={command=full-import}
status=0 QTime=13 {deleteByQuery=*:*} 0 13
Jan 23, 2013 3:26:31 PM org.apache.solr.common.SolrException log
SEVERE: Full Import failed:java.lang.RuntimeException:
java.lang.RuntimeException:
org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to
execute query: SELECT [ProdID],[Descr] FROM [Amazon].[dbo].[Table_Temp]
Processing Document # 1
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:273)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:382)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:448)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:429)
Caused by: java.lang.RuntimeException:
org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to
execute query: SELECT [ProdID],[Descr] FROM [Amazon].[dbo].[Table_Temp]
Processing Document # 1
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:413)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:326)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:234)
... 3 more
Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException:
Unable to execute query: SELECT [ProdID],[Descr] FROM
[Amazon].[dbo].[Table_Temp] Pr

Re: Starting instances with multiple collections

2013-01-24 Thread Per Steffensen
Each node needs a -Dsolr.solr.home pointing to a solr.xml, but the 
configuration-subfolder does not need to be there. It only needs to be 
there for the node you start with -Dbootstrap_confdir (to have it load 
the config into ZK). The next time you start this Solr you do not need 
to provide -Dbootstrap_confdir, since config is already loaded into ZK 
(well unless you run your ZK embedded in the Solr - in this case I 
believe all ZK state is removed when you close the Solr, but that is 
also just for "playing")
In general, IMHO, using a Solr node to load a configuration during 
startup is only for "playing". You ought to load configs into ZK as a 
separate operation from starting Solrs (and creating collections for 
that matter). Also see recent mail-list dialog "Submit schema definition 
using curl via SOLR"


Regards, Per Steffensen

On 1/23/13 11:12 PM, Walter Underwood wrote:

I can get one Solr 4.1 instance up with the config bootstrapped into Zookeeper. 
In zk I see two configs, two collections, and I can run the DIH on the first 
node.

I can get the other two nodes to start and sync if I give them a 
-Dsolr.solr.home pointing to a directory with a solr.xml and subdirectories 
with configuration for each collection. If I don't do that, they look for 
solr/solr.xml, then fail. But what is the point of putting configs in Zookeeper 
if each host needs a copy anyway?

The wiki does not have an example of how to start a cluster with multiple 
collections.

Am I missing something here?

wunder
--
Walter Underwood
wun...@wunderwood.org








Re: Submit schema definition using curl via SOLR

2013-01-24 Thread Per Steffensen

On 1/24/13 4:51 PM, Per Steffensen wrote:


2) or You can have an Solr node (server) load a "Solr config" into ZK 
during startup by adding collection.configName and bootstrap_confdir 
VM params - something like this
java -DzkHost= 
-Dcollection.configName= 
-Dbootstrap_confdir= -jar start.jar

Well  instead of edr_sms_conf, of course



Re: Does solr 4.1 support field compression?

2013-01-24 Thread Shawn Heisey

On 1/24/2013 8:42 AM, Ken Prows wrote:

I didn't see any mention of field compression in the release notes for
Solr 4.1. Did the ability to automatically compress fields end up
getting added to this release?


The concept of compressed fields (an option in schema.xml) that existed 
in the 1.x versions of Solr (based on Lucene 2.9) was removed in Lucene 
3.0.  Because Lucene and Solr development were combined, the Solr 
version after 1.4.1 is 3.1.0, there is no 1.5 or 2.x version of Solr.


Solr/Lucene 4.1 compresses all stored field data by default.  I don't 
think there's a way to turn it off at the moment, which is causing 
performance problems for a small subset of Solr users.  When it comes 
out, Solr 4.2 will also have compressed term vectors.


The release note contains this text:

Stored fields are compressed. (See 
http://blog.jpountz.net/post/33247161884/efficient-compressed-stored-fields-with-lucene)


It looks like the solr CHANGES.txt file fails to specifically mention 
LUCENE-4226  which 
implemented compressed stored fields.


Thanks,
Shawn



Re: Does solr 4.1 support field compression?

2013-01-24 Thread Ken Prows
Doh!, I went straight for the release notes. Thanks, this is the
feature I was waiting for :)

Ken

On Thu, Jan 24, 2013 at 10:49 AM, André Widhani
 wrote:
> This is what it listed under the "Highlights" on the Apache page announcing 
> the Solr 4.1 release:
>
>   "The default codec incorporates an efficient compressed stored fields 
> implementation that compresses chunks of documents together with LZ4. (see 
> http://blog.jpountz.net/post/33247161884/efficient-compressed-stored-fields-with-lucene)"
>
> André
>
> 
> Von: Rafał Kuć [r@solr.pl]
> Gesendet: Donnerstag, 24. Januar 2013 16:45
> An: solr-user@lucene.apache.org
> Betreff: Re: Does solr 4.1 support field compression?
>
> Hello!
>
> It should be turned on by default, because the stored fields
> compression is the behavior of the default Lucene 4.1 codec.
>
> --
> Regards,
>  Rafał Kuć
>  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch
>
>> Hi everyone,
>
>> I didn't see any mention of field compression in the release notes for
>> Solr 4.1. Did the ability to automatically compress fields end up
>> getting added to this release?
>
>> Thanks!,
>> Ken
>


Re: Submit schema definition using curl via SOLR

2013-01-24 Thread Per Steffensen
Basically uploading a "Solr config" (including schema.xml, 
solrconfig.xml etc.) is an operation different from creating 
collections. When creating a collection (e.g. using the Collection API) 
you reference the (already existing) "Solr config" it needs to use. 
Collections can share "Solr config"s. I know of at least two ways to 
load a "Solr config" into ZK using Solr-tools.


1) You can use ZkCLI tool (of course ZK needs to be started) - something 
like this

mkdir -p "${SOLR_INSTALL}/example/webapps/temp"
cp "${SOLR_INSTALL}/example/webapps/solr.war" 
"${SOLR_INSTALL}/example/webapps/temp"

cd "${SOLR_INSTALL}/example/webapps/temp"
jar -xf "solr.war"
java  -classpath "${SOLR_INSTALL}/example/webapps/temp/WEB-INF/lib/*" 
org.apache.solr.cloud.ZkCLI -cmd upconfig -confdir 
 -confname  --zkhost 


rm -rf "${SOLR_INSTALL}/example/webapps/temp"
Believe there is also a zkcli.sh tool

2) or You can have an Solr node (server) load a "Solr config" into ZK 
during startup by adding collection.configName and bootstrap_confdir VM 
params - something like this
java -DzkHost= -Dcollection.configName=edr_sms_conf 
-Dbootstrap_confdir= -jar start.jar


I prefer 1) for several reasons.

Regards, Per Steffensen

On 1/24/13 4:02 PM, Fadi Mohsen wrote:

Hi, We would like to use Solr to index statistics from any Java module in
our production environment.

Applications have to can create collections and index data on demand, so my
initial thought is to use different HTTP methods to accomplish a collection
in cluster and then right away start HTTP POST documents, but the issue
here is the schema.xml.
Is it possible to HTTP POST the schema via Solr to Zookeeper?

Or do I have to know about other service host/IP than SOLR, such as
ZooKeeper (wanted to understand whether there is a way to avoid knowing
about zookeeper in production.)?

This must be a duplicate of another question, excuse me in advance.

Regards
Fadi





AW: Does solr 4.1 support field compression?

2013-01-24 Thread André Widhani
This is what it listed under the "Highlights" on the Apache page announcing the 
Solr 4.1 release:

  "The default codec incorporates an efficient compressed stored fields 
implementation that compresses chunks of documents together with LZ4. (see 
http://blog.jpountz.net/post/33247161884/efficient-compressed-stored-fields-with-lucene)"

André


Von: Rafał Kuć [r@solr.pl]
Gesendet: Donnerstag, 24. Januar 2013 16:45
An: solr-user@lucene.apache.org
Betreff: Re: Does solr 4.1 support field compression?

Hello!

It should be turned on by default, because the stored fields
compression is the behavior of the default Lucene 4.1 codec.

--
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

> Hi everyone,

> I didn't see any mention of field compression in the release notes for
> Solr 4.1. Did the ability to automatically compress fields end up
> getting added to this release?

> Thanks!,
> Ken



Re: Does solr 4.1 support field compression?

2013-01-24 Thread Rafał Kuć
Hello!

It should be turned on by default, because the stored fields
compression is the behavior of the default Lucene 4.1 codec.

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

> Hi everyone,

> I didn't see any mention of field compression in the release notes for
> Solr 4.1. Did the ability to automatically compress fields end up
> getting added to this release?

> Thanks!,
> Ken



Does solr 4.1 support field compression?

2013-01-24 Thread Ken Prows
Hi everyone,

I didn't see any mention of field compression in the release notes for
Solr 4.1. Did the ability to automatically compress fields end up
getting added to this release?

Thanks!,
Ken


Re: AW: AW: auto completion search with solr using NGrams in SOLR

2013-01-24 Thread Naresh
Hi,
You can fetch all the stored fields by passing them as part of
*fl*parameter. Go through
http://wiki.apache.org/solr/CommonQueryParameters#fl


On Thu, Jan 24, 2013 at 8:56 PM, AnnaVak  wrote:

> Thanks for your solution it works for me too, I'm new with Solr but how I
> can
> additionally fetch another fields not only field that was used for
> searching? For example I have product title and image fields and I want to
> get the title but also related to this title image ? How can I do this?
>
> Thanks in advance
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/auto-completion-search-with-solr-using-NGrams-in-SOLR-tp3998559p4035931.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Regards
Naresh


Submit schema definition using curl via SOLR

2013-01-24 Thread Fadi Mohsen
Hi, We would like to use Solr to index statistics from any Java module in
our production environment.

Applications have to can create collections and index data on demand, so my
initial thought is to use different HTTP methods to accomplish a collection
in cluster and then right away start HTTP POST documents, but the issue
here is the schema.xml.
Is it possible to HTTP POST the schema via Solr to Zookeeper?

Or do I have to know about other service host/IP than SOLR, such as
ZooKeeper (wanted to understand whether there is a way to avoid knowing
about zookeeper in production.)?

This must be a duplicate of another question, excuse me in advance.

Regards
Fadi


Re: Problem with migration from solr 3.5 with SOLR-2155 usage to solr 4.0

2013-01-24 Thread Viacheslav Davidovich
Hi David,

thank you for your answer.

After update to this field type and change the SOLR query I receive required 
behavior.

Also could you update the WIKI page after the words "it needs to be in 
WEB-INF/lib in Solr's war file, basically" also add the maven artifact code 
like this?


com.vividsolutions
jts
1.13
 

I think this may help for users used maven.

WBR Viacheslav.

On 23.01.2013, at 19:24, Smiley, David W. wrote:

> Viacheslav,
> 
> 
> SOLR-2155 is only compatible with Solr 3.  However the technology it is
> based on lives on in Lucene/Solr 4 in the
> "SpatialRecursivePrefixTreeFieldType" field type.  In the example schema
> it's registered under the name "location_rpt".  For more information on
> how to use this field type, see: SpatialRecursivePrefixTreeFieldType
> 
> ~ David Smiley
> 
> On 1/23/13 11:11 AM, "Viacheslav Davidovich"
>  wrote:
> 
>> Hi, 
>> 
>> With Solr 3.5 I use SOLR-2155 plugin to filter the documents by distance
>> as described in 
>> http://wiki.apache.org/solr/SpatialSearch#Advanced_Spatial_Search and
>> this solution perfectly filter the multiValued data defined in schema.xml
>> like
>> 
>> > length="12" />
>> 
>> > multiValued="true"/>
>> 
>> the query looks like this with Solr 3.5:  q=*:*&fq={!geofilt}&sfield=
>> location_data&pt=45.15,-93.85&d=50&sort=geodist() asc
>> 
>> As SOLR-2155 plugin not compatible with solr 4.0 I try to change the
>> field definition to next:
>> 
>> > subFieldSuffix="_coordinate" />
>> 
>> > multiValued="true"/>
>> 
>> > stored="false" />
>> 
>> But in this case after geofilt by location_data execution the correct
>> values returns only if the field have 1 value, if more them 1 value
>> stored in index required documents returns only when all the location
>> points are matched.
>> 
>> Have anybody experience or any ideas how to receive the same behavior in
>> solr4.0 as this was in solr3.5 with SOLR-2155 plugin usage?
>> 
>> Is this possible at all or I need to refactor the document structure and
>> field definition to store only 1 location value per document?
>> 
>> WBR Viacheslav.
>> 
> 
> 



Solr 4.1.0 shardHandlerFactory Null Pointer Exception when setting up embedded solrj solr server for unit testing

2013-01-24 Thread Ted Merchant
We recently updated from Solr 4.0.0 to Solr 4.1.0.  Because of the change we 
were forced to upgrade a custom query parser.  While the code change itself was 
minimal, we found that our unit tests stopped working because of a 
NullPointerException on line 181 of handler.component.SearchHandler:
ShardHandler shardHandler1 = shardHandlerFactory.getShardHandler();
We determined that the cause of this exception was that shardHandlerFactory was 
never initialized in the solr container.  The reason for this seems to be that 
the shard handler is setup in core.CoreContainer::initShardHandler which is 
called from core.CoreContainer::load.
When setting up the core container we were using the  public 
CoreContainer(SolrResourceLoader loader) constructor.  This constructor never 
calls the load method, so initShardHandler is never called and the shardHandler 
is never initialized.

In Solr 4.0.0 the shardHandler was initialized on the calling of 
getShardHandlerFactory.  This code was modified and moved by revision 1422728: 
SOLR-4204: Make SolrCloud tests more friendly to FreeBSD blackhole 2 
environments.

We fixed our issue by using the public CoreContainer(String dir, File 
configFile) constructor which calls the load method.
I just wanted to make sure that people were aware of this issue and to 
determine if it really is an issue or if having the shardHandler be null was 
expected behavior unless someone called the load(String dir, File configFile ) 
method.

Thank you,

Ted



Stack trace of error:
org.apache.solr.client.solrj.SolrServerException: 
org.apache.solr.client.solrj.SolrServerException: java.lang.NullPointerException
at 
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:223)
at 
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:90)
at 
org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301)
at 
com.cision.search.solr.ProximityQParserTest.testInit(ProximityQParserTest.java:72)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown 
Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
at 
org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
at 
org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
at 
org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
at 
org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
at 
org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
at 
org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
Caused by: org.apache.solr.client.solrj.SolrServerException: 
java.lang.NullPointerException
at 
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:155)
... 27 more
Caused by: java.lang.NullPointerException
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:181)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816)
at 
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:15

Re: AW: AW: auto completion search with solr using NGrams in SOLR

2013-01-24 Thread AnnaVak
Thanks for your solution it works for me too, I'm new with Solr but how I can
additionally fetch another fields not only field that was used for
searching? For example I have product title and image fields and I want to
get the title but also related to this title image ? How can I do this?

Thanks in advance 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/auto-completion-search-with-solr-using-NGrams-in-SOLR-tp3998559p4035931.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr autocomplete feature

2013-01-24 Thread Ilayaraja . P
Hi

 I want to change autocomplete implementation for our search. Current I have a 
suggest field whose definition in schema.xml is as below:

   















It works as follows.
"shoes" will match "casual shoes", "sports shoes", "shoes" etc.


Whereas I want it to match only the values that starts with the user query.
Ie. If user types "shoes", I want suggest terms that starts with "shoes" (or) 
has the query string as prefix string in "suggest" filed in the index.

Please let me know how to do this.

Regards,
Ilay


Solr autocomplete feature

2013-01-24 Thread ilay
Hi

 I want to change autocomplete implementation for our search. Current I have
a suggest field whose definition in schema.xml is as below:

   















It works as follows. 
“shoes” will match “casual shoes”, “sports shoes”, “shoes” etc.


Whereas I want it to match only the values that starts with the user query.
Ie. If user types “shoes”, I want suggest terms that starts with “shoes”
(or) has the query string as prefix string in “suggest” filed in the index.

Please let me know how to do this.

Regards,
Ilay




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-autocomplete-feature-tp4035927.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: zookeeper config

2013-01-24 Thread Shawn Heisey

On 1/24/2013 12:58 AM, Per Steffensen wrote:
This is supported. You just need to ajust your ZK connection-string: 
":/solr,:/solr,...,:/solr"
My experience has been that you put the chroot at the very end, not on 
every host entry.  For a standalone zookeeper ensemble with three nodes:


"server1:2181,server2:2181,server3:2181/mysolr1"

This is used for the zkHost parameter both on Solr startup and with the 
CloudSolrServer object from SolrJ.  The string is used without 
modification in constructing the actual ZooKeeper object down in the 
SolrCloud internals.  Here's the documentation for that object:


http://zookeeper.apache.org/doc/r3.4.5/api/org/apache/zookeeper/ZooKeeper.html#ZooKeeper%28java.lang.String,%20int,%20org.apache.zookeeper.Watcher%29

Thanks,
Shawn



Re: solr running with multi cores

2013-01-24 Thread Otis Gospodnetic
Hi,

Please search the mailing list archives - this has been discussed a few
times in the last few months.

Otis
Solr & ElasticSearch Support
http://sematext.com/
On Jan 24, 2013 6:33 AM, "real_junlin"  wrote:

> Hi,
> Our company want to use solr to index our reports'data ,so we are going to
> understand solr.
>  Solr support the multi cores ,in our system, the cores'num will dynamic
> increase, I afraid with more cores,the performance is decresing
> dramatically.Our system's cores'num will by over one hundred.
>
>
> What I want to know is:
> How many cores is supported of solr , under which level can solr running
> perfectly?
> How solr allocate the system's resource(memory,disk space, cpu...) of the
> multi cores?
> Is there a performance experment about solr running with many cores ?
>
>
>
>
> Thanks
>  junlin.


solr running with multi cores

2013-01-24 Thread real_junlin
Hi,
Our company want to use solr to index our reports'data ,so we are going to 
understand solr.
 Solr support the multi cores ,in our system, the cores'num will dynamic 
increase, I afraid with more cores,the performance is decresing 
dramatically.Our system's cores'num will by over one hundred.


What I want to know is:
How many cores is supported of solr , under which level can solr running 
perfectly?
How solr allocate the system's resource(memory,disk space, cpu...) of the multi 
cores?
Is there a performance experment about solr running with many cores ?




Thanks
 junlin.

Re: zookeeper config

2013-01-24 Thread J Mohamed Zahoor
Cool. Thanks.


On 24-Jan-2013, at 1:28 PM, Per Steffensen  wrote:

> This is supported. You just need to ajust your ZK connection-string: 
> ":/solr,:/solr,...,:/solr"
> 
> Regards, Per Steffensen
> 
> On 1/24/13 7:57 AM, J Mohamed Zahoor wrote:
>> Hi
>> 
>> I am using Solr 4.0.
>> I see the Solr data in zookeeper is placed on the root znode itself.
>> This becomes a pain if the zookeeper instance is used for multiple projects 
>> like HBase and like.
>> 
>> I am thinking of raising a Jira for putting them under a znode /solr or 
>> something like that?
>> 
>> ./Zahoor
>> 
>> 
> 



Re: setting up master and slave in same machine with diff ip's and same port

2013-01-24 Thread Upayavira
You could configure your servlet container (jetty/tomcat) to have
specific webapps/contexts listen on specific IP/port combinations, that
would get you some way, But what you are asking is more about networking
and servlet container configuration than about Solr.

Upayavira

On Wed, Jan 23, 2013, at 10:48 PM, epnRui wrote:
> Hi everyone 
> 
> its my first post here so I hope im doing it in the right place. 
> 
> Im a software developer and Im setting up a DEV environment in Ubuntu
> with
> the same configuration as in PROD. (apparently this IT department doesnt
> know the difference between a developer and a sys admin) 
> 
> In PROD we have Solr Master and Solr slave, on two different IPs. Lets
> say: 
> Master 192.10.1.1 
> Slave 192.10.1.2 
> 
> In DEV I have only one server: 
> 10.1.1.1 
> 
> All of them are Ubuntu servers. 
> 
> Can I put Master and Slave, without touching any configurations in
> Solr,no
> IP change, no Port change, in 10.1.1.1 (DEV), and still make it work? 
> 
> Basically what Im looking for is what Ubuntu server configuration Id have
> to
> do to make this work. 
> 
> Thanks a lot
> 
> 
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/setting-up-master-and-slave-in-same-machine-with-diff-ip-s-and-same-port-tp4035795.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Confused by queries

2013-01-24 Thread Anders Melchiorsen

Hello.

That is indeed an excellent article, thanks for pointing me at it. With
a title like that, it is no wonder that I was unable to google it on my
own.

It is probably the exception in this rule that has been confusing me:

If a BooleanQuery contains no MUST BooleanClauses, then a
document is only considered a match against the BooleanQuery
if one or more of the SHOULD BooleanClauses is a match.

So "+group:id +keyword:text" and "(+group:id) +keyword:text" mean
completely different things.

I have mostly been using the reference at
http://lucene.apache.org/core/3_6_0/queryparsersyntax.html and it does
not mention this distinction. Quite the contrary, actually, as it says
that grouping can be used to eliminate confusion, thereby suggesting 
that

the usual rules of Boolean algebra apply.


Thanks again,
Anders.


On 23.01.2013 02:20, Erick Erickson wrote:

Solr/Lucene does not implement strict boolean logic. Here's an
excellent blog discussing this:

http://searchhub.org/dev/2011/12/28/why-not-and-or-and-not/

Best
Erick

On Tue, Jan 22, 2013 at 7:25 PM, Otis Gospodnetic
 wrote:

Well, depends on what you indexed.

Otis
Solr & ElasticSearch Support
http://sematext.com/
On Jan 22, 2013 5:48 PM, "Anders Melchiorsen" 


wrote:


Thanks, though I am still confused.

How about this one:

manu:apple => 1 hit
+name:video => 2 hits

manu:apple +name:video => 2 hits

Solr ignores the manu:apple part completely?


Cheers,
Anders.


Den 22/01/13 23.16, Jack Krupansky skrev:


The first query:

   name:ipod OR -name:ipod => 0 hits

The "OR" and "-" are actually at the same level of the 
BooleanQuery, so

the "-" overrides the OR so it's equivalent to:

   name:ipod -name:ipod => 0 hits

For the second query:

   (name:ipod) OR (-name:ipod) => 3 hits

Pure negative queries are supported only at the top level, so the
"(-name:ipod)" matches nothing, so the query is equivalent to:

   (name:ipod) => 3 hits

You can simply insert a "*:*" to assure that it is not a pure 
negative

query inside the parentheses:

   (name:ipod) OR (*:* -name:ipod)

-- Jack Krupansky

-Original Message- From: Anders Melchiorsen
Sent: Tuesday, January 22, 2013 4:59 PM
To: solr-user@lucene.apache.org
Subject: Confused by queries

Hello!

With the example server of Solr 4.0.0 (with *.xml indexed), I get 
these

results:

*:* => 32 hits
name:ipod => 3 hits
-name:ipod => 29 hits

That is all fine, but for these next queries, I would expect to 
get 32
hits (i.e. everything), or at least the same number of hits for 
both

queries:

name:ipod OR -name:ipod => 0 hits
(name:ipod) OR (-name:ipod) => 3 hits

As my expectations are not met, I must be missing something?


Thanks,
Anders.








RE: problem in qf parameter - no results

2013-01-24 Thread Markus Jelsma
Hi,

I think it's your mm-parameter and that the terms are not matched in the 
'setctor' field.

Cheers, 
 
-Original message-
> From:Gastone Penzo 
> Sent: Thu 24-Jan-2013 10:11
> To: solr-user@lucene.apache.org
> Subject: problem in qf parameter - no results
> 
> Hi,
> i have a problem with qf parameter:
> 
> 
> 38 results
> localhost:8983/solr/select/?defType=edismax&qf=title^1author ^0.75
> publisher^0.25&q=bibbia di gerusalemme
> 
> 0 risults
> localhost:8983/solr/select/?defType=edismax&qf=title^1 author^0.75
> publisher^0.25 setctor^0.25&q=bibbia di gerusalemme
> 
> 
> the different is only the field sector which is:
> 
>  required="false" multiValued="true"/>
> 
> why adding the sector field in qf parameter solr returns 0 products??
> 
> thank you
> 
> -- 
> *Gastone Penzo*
> *
> *
> 


RE: Issues with docFreq/docCount on SolrCloud

2013-01-24 Thread Markus Jelsma
Alright, so my suggestion of overriding HttpShardHandler to route users to the 
same replica instead of shuffling the replica URL's is doable? What about the 
comment in HttpShardHandler then?

  //
  // Shuffle the list instead of use round-robin by default.
  // This prevents accidental synchronization where multiple shards could 
get in sync
  // and query the same replica at the same time.
  //
  if (urls.size() > 1)
Collections.shuffle(urls, httpShardHandlerFactory.r);
  shardToURLs.put(shard, urls);

Instead of shuffling i would then hash the user to the correct replica if 
possible.

Thanks,
Markus
 
-Original message-
> From:Mark Miller 
> Sent: Thu 24-Jan-2013 00:33
> To: solr-user@lucene.apache.org
> Subject: Re: Issues with docFreq/docCount on SolrCloud
> 
> 
> On Jan 23, 2013, at 6:21 PM, Yonik Seeley  wrote:
> 
> > A solr request could request a token that when resubmitted with a
> > follow-up request would result in hitting the same replicas if
> > possible.
> 
> Yeah, this would be good. It's also useful for not catching "eventual 
> consistency" effects between queries.
> 
> - Mark


Re: Hi

2013-01-24 Thread Dmitry Kan
(start-off-topic): Alexandre, nice ideas. Last in the *) list is a bit far
stretched, but still good. I would still add one: how to have exact matches
and inexact matches in the same analyzed field. (end-off-topic)

On Wed, Jan 23, 2013 at 2:40 PM, Alexandre Rafalovitch
wrote:

> We need a "Make your own adventure"  (TM) Solr troubleshooting guide. :-)
>
> *) You are staring at the Solr installation full of twisty little passages
> and nuances. Would you like to:
>*) Build your first index?
>*) Make your first query?
>*) Spread your documents in the cloud?
>*) Build your own UpdateProcessor to integrate reverse Geocoding web
> service into your NLP disambiguation UIMA module to drive your More Like
> This suggestions?
>
> Well, maybe somebody with more imagination can figure the better way to
> phrase it. Then, we make a mobile app for doing this and retire
> millionaires. :-) Though that last one could make for an awesome Solr demo.
> :-)
>
> Seriously though.
>
> Thendral,
> You do need to say at least how far you got before you emailed us. Have you
> gone through tutorial and understood that but your own custom schema is
> giving you troubles? Have you tried indexing a Solr Update XML document
> containing the data you believe you have?
>
> You need to be able to take a long problem and split it into half and see
> which half works and which one does not. It is bit hard from your
> description.
>
> Regards,
>Alex.
>
>
> Personal blog: http://blog.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all at
> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
>
>
> On Wed, Jan 23, 2013 at 7:00 AM, Upayavira  wrote:
>
> > You are going to have to give more information than this. If you get bad
> > request, look in the logs for the Solr server and you will probably find
> > an exception there that tells you what was wrong with your document.
> >
> > Upayavira
> >
> > On Wed, Jan 23, 2013, at 08:58 AM, Thendral Thiruvengadam wrote:
> > > Hi,
> > >
> > > We are trying to use solar for indexing our application data.
> > >
> > > When we try to add a new object into solr, we are getting Bad Request.
> > >
> > > Please help us with this.
> > >
> > > Thanks,
> > > Thendral
> > >
> > > 
> > >
> > > http://www.mindtree.com/email/disclaimer.html
> >
>