Re: [Lucene.Net] Roadmap

2011-11-21 Thread casper...@caspershouse.com
+1 on the suggestion to move Close - IDisposable; not being able to use 
using is such a pain, and an eyesore on the code.


Although it will have to be done properly, and not just have Dispose call 
Close (you should have proper protected virtual Dispose methods to take 
inheritance into account, etc).


- Nick



From: Christopher Currens currens.ch...@gmail.com

Sent: Monday, November 21, 2011 2:56 PM

To: lucene-net-...@lucene.apache.org

Subject: Re: [Lucene.Net] Roadmap


Regarding the 3.0.3 branch I started last week, I've put in a lot of late

nights and gotten far more done in a week and a half than I expected.  The

list of changes is very large, and fortunately, I've documented it in some

files that are in the branches root of certain projects.  I'll list what

changes have been made so far, and some of the concerns I have about them,

as well as what still needs to be done.  You can read them all in detail 
in

the files that are in the branch.


All changes in 3.0.3 have been ported to the Lucene.Net and

Lucene.Net.Test, except BooleanClause, LockStressTest, MMapDirectory,

NIOFSDirectory, DummyConcurrentLock, NamedThreadFactory, and

ThreadInterruptedException.


MMapDirectory and NIOFSDirectory have never been ported in the first place

for 2.9.4, so I'm not worried about those.  LockStressTest is a

command-line tool, porting it should be easy, but not essential to a 3.0.3

release, IMO.  DummyConcurrentLock also seems unnecessary (and

non-portable) for .NET, since it's based around Java's Lock class and is

only used to bypass locking, which can be done by passing new Object() to

the method.

NamedThreadFactory I'm unsure about.  It's used in ParallelMultiSearcher

(in which I've opted to use the TPL), and seems to be only used for

debugging, possibly testing.  Either way, I'm not sure it's necessary.

Also, named threads would mean we probably would have to move the class

from the TPL, which greatly simplified the code and parallelization of it

all, as I can't see a way to Set names for a Task.  I suppose it might be

possible, as Tasks have unique Ids, and you could use a Dictionary to map

the thread's name to the ID in the factory, but you'd have to create a

helper function that would allow you to find a task by its name, which

seems more work than the resulting benefits.  VS2010 already has better

support for debugging tasks over threads (I used it when writing the

class), frankly, it's amazing how easy it was to debug.


Other than the above, the entire code base in the core dlls is at 3.0.3,

which is exciting, as I'm really hoping we can get Lucene.Net up to the

current version of Java's 3.x branch, and start working on a line-by-line

port of 4.0.  Tests need to be written for some of the collections I've

made that emulate Java's, to make sure they're even behaving the same way.

The good news is that all of the existing tests pass as a whole, so it

seems to be working, though I'd like the peace of mind of having tests for

them (being HashMapTKey, TValue, WeakDictionaryTKey, TValue and

IdentityCollectionTKey, TValue, it's quite possible any one of them 
could

be completely wrong in how they were put together.)


I'd also like to finally formalize the way we use IDisposable in

Lucene.Net, by marking the Close functions as obsolete, moving the code

into Dispose, and eventually (or immediately) removing the Close 
functions.

There's so much change to the API, that now would be a good time to make

that change if we wanted to.  I'm hesitant to move from a line-by-line 
port

of Lucene.Net completely, but rather having it be close as possible.  The

main reason I feel this way, is when I was porting the Shingle namespace 
of

Contrib.Analyzers, Troy has written it in a .Net way which different

GREATLY from java lucene, and it did make porting it considerably more

difficult; to keep the language to a minimum, I'm just going to say it was

a pain, a huge pain in fact.  I love the idea of moving to a more .NET

design, but I'd like to maintain a line-by-line port anyway, as I think

porting changes is far easier and quicker that way.  At this point, I'm

more interested in getting Lucene.Net to 4.0 and caught up to java, than I

am anything else, hence the extra amount of time I've put into this 
project

over the past week and a half.  Though this isn't really a place for this

discussion.


The larger area of difficult for the port, however, is the Contrib 
section.

There are two major problems with it that is slowing me down.  First,

there are a lot of classes that are outdated.  I've found versions of code

that still have the Apache 1.1 License attached to it, which makes the 
code

quite old.  Also, it was almost impossible for me to port a lot of changes

in Contrib.Analyzers, since the code was so old and different from Java's

2.9.4.


Second, we had almost no unit tests ported for any of the classes, which

means 

Re: [Lucene.Net] Roadmap

2011-11-21 Thread casper...@caspershouse.com


Christopher,


I'd say there not that hard to get wrong, the pattern for correctly 
implementing the IDisposable interface is well-established and has been 
common practice since .NET 1.0:


http://msdn.microsoft.com/en-us/library/b1yfkh5e(v=VS.100).aspx


Additionally, I said protected virtual (as per the recommendation in the 
link above).


Also agreed on the use of iterators everywhere.  Foreach is your friend.


What would be even better in some cases, using yield return, as I'm sure 
result sets don't need to be materialized everywhere as they are now.


- Nick



From: Christopher Currens currens.ch...@gmail.com

Sent: Monday, November 21, 2011 3:18 PM

To: lucene-net-...@lucene.apache.org, casper...@caspershouse.com

Subject: Re: [Lucene.Net] Roadmap


Some of the Lucene classes have Dispose methods, well, ones that call Close 
(and that Close method may or may not call base.Close(), if needed or not). 
 Virtual dispose methods can be dangerous only in that they're easy to 
implement wrong.  However, it shouldn't be too bad, at least with a 
line-by-line port, as we would make the call to the base class whenever 
Lucene does, and that would (should) give us the same behavior, implemented 
properly.  I'm not aware of differences in the JVM, regarding inheritance 
and base methods being called automatically, particularly Close methods.

Slightly unrelated, another annoyance is the use of Java Iterators vs C# 
Enumerables.  A lot of our code is there simply because there are 
Iterators, but it could be converted to Enumerables. The whole HasNext, 
Next vs C#'s MoveNext(), Current is annoying, but it's used all over in the 
base code, and would have to be changed there as well.  Either way, I would 
like to push for that before 3.0.3 is relased.  IMO, small changes like 
this still keep the code similar to the line-by-line port, in that it 
doesn't add any difficulties in the porting process, but provides great 
benefits to the users of the code, to have a .NET centric API.  I don't 
think it would violate our project desciption we have listed on our 
Incubator page, either.

Thanks,
Christopher


On Mon, Nov 21, 2011 at 12:03 PM, casper...@caspershouse.com 
casper...@caspershouse.com wrote:

+1 on the suggestion to move Close - IDisposable; not being able to use

using is such a pain, and an eyesore on the code.


Although it will have to be done properly, and not just have Dispose call

Close (you should have proper protected virtual Dispose methods to take

inheritance into account, etc).


- Nick





From: Christopher Currens currens.ch...@gmail.com


Sent: Monday, November 21, 2011 2:56 PM


To: lucene-net-...@lucene.apache.org


Subject: Re: [Lucene.Net] Roadmap


Regarding the 3.0.3 branch I started last week, I've put in a lot of late


nights and gotten far more done in a week and a half than I expected.  The


list of changes is very large, and fortunately, I've documented it in some


files that are in the branches root of certain projects.  I'll list what


changes have been made so far, and some of the concerns I have about them,


as well as what still needs to be done.  You can read them all in detail

in


the files that are in the branch.


All changes in 3.0.3 have been ported to the Lucene.Net and


Lucene.Net.Test, except BooleanClause, LockStressTest, MMapDirectory,


NIOFSDirectory, DummyConcurrentLock, NamedThreadFactory, and


ThreadInterruptedException.


MMapDirectory and NIOFSDirectory have never been ported in the first place


for 2.9.4, so I'm not worried about those.  LockStressTest is a


command-line tool, porting it should be easy, but not essential to a 3.0.3


release, IMO.  DummyConcurrentLock also seems unnecessary (and


non-portable) for .NET, since it's based around Java's Lock class and is


only used to bypass locking, which can be done by passing new Object() to


the method.


NamedThreadFactory I'm unsure about.  It's used in ParallelMultiSearcher


(in which I've opted to use the TPL), and seems to be only used for


debugging, possibly testing.  Either way, I'm not sure it's necessary.


Also, named threads would mean we probably would have to move the class


from the TPL, which greatly simplified the code and parallelization of it


all, as I can't see a way to Set names for a Task.  I suppose it might be


possible, as Tasks have unique Ids, and you could use a Dictionary to map


the thread's name to the ID in the factory, but you'd have to create a


helper function that would allow you to find a task by its name, which


seems more work than the resulting benefits.  VS2010 already has better


support for debugging tasks over threads (I used it when writing the


class), frankly, it's amazing how easy it was to debug.


Other than the above, the entire code base in the core dlls is at 3.0.3,


which is exciting, as I'm really hoping we can get Lucene.Net up

RE: [Lucene.Net] 2.9.4

2011-09-23 Thread casper...@caspershouse.com
Prescott,


You can do one of two things:


- Remove the volatile keyword, but keep the lock statement around the 
access to the field

- Remove the lock, and add the volatile keyword to the field


This will allow you to assign to the _infoStream variable (read/write) and 
be sure to have the most up-to-date value in the variable, as well as 
guarantee atomic reads/writes to that variable.


Your example is incorrect.  The volatile on p only guarantees that 
reads/writes will be current if p is changed.  In other words, if you 
assign a new person instance to p, you can do so without using a lock 
statement and guarantee that the reads/writes from p will be atomic.


However, any calls you make to p are not protected, not because of 
volatile.  Volatile will *never* be able to protect calls, it only protects 
variables.


Lock, on the other hand, can protect calls, assuming that you cover all the 
calls with the same lock.  You can also group other operations and make 
sure that synchronization occurs.


Note that a lock *only* guarantees atomicity/mutual exclusion; when applied 
to multiple statements, there's no guarantee that you won't corrupt 
something.  If an exception is thrown inside of a lock statement (the 
second line in three lines of code in the lock block, for example) then the 
previous values don't roll back or anything.


Because atomicity with a variable assignment is mutually exclusive around 
*one* operation, there's no chance of corruption.


Let me know if you want further clarification.


Additionally, if you have a specific patch/issue in JIRA you want me to 
look at, let me know, I'll let you know if the solution is right from a 
thread-safety point of view.


- Nick



From: Prescott Nasser geobmx...@hotmail.com

Sent: Friday, September 23, 2011 1:17 AM

To: lucene-net-dev@lucene.apache.org

Subject: RE: [Lucene.Net] 2.9.4


I see, so you're essentially saying, I can simply remove the volatile 
keyword in this case, and it's exactly the same becuase I am only using it 
for read and writes?


So the case I'd need to be more careful of is if an manipulation method is 
called on the object itself - suppose:


public person {


_name = Me   


public changeName(string n)


{


_name = n;


}


}


If one were to write 


public volatile person p = new person();


p.changeName(You);


the call to the method would in this case need a lock (which volatile 
gives) to gaurentee that changeName occurs before other items read or 
overwrite variable p?


but a straight read or write won't matter:


p = new person();


p = new person():


x = p;


p = new person();


Here, I wouldn't need the volatile keyword becuase those are merely reads 
and wrights?




 CC: lucene-net-dev@lucene.apache.org

 From: casper...@caspershouse.com

 Date: Thu, 22 Sep 2011 23:58:42 -0400

 To: lucene-net-dev@lucene.apache.org

 Subject: Re: [Lucene.Net] 2.9.4



 Prescott,



 You really don't need to do that; reads and writes of reference fields 
are guaranteed to be atomic as per section 5.5 of the C# Language 
Specufication (Atomicity of variable references)



 If you were doing other operations beyond the read and write that you 
wanted to be atomic, then the lock statement is appropriate, but in this 
case it's not.



 The volatile keyword in this case (assuming no lock) would absolutely be 
needed to guarantee that the variable has the most up-to-date value at the 
time of access; using lock does this for you and makes volatile 
unnecessary.



 - Nick







 On Sep 22, 2011, at 11:14 PM, Prescott Nasser geobmx...@hotmail.com 
wrote:



 

  Before I go replacing all the volatile fields I wanted to run this past 
the list:

 

 

 

  private System.IO.StreamWriter infoStream;

 

 

  into

 

 

 

  private object o = new object();

  private System.IO.StreamWriter _infoStream;

  private System.IO.StreamWriter infoStream

  {

  get

  {

  lock (o)

  {

  return _infoStream;

  }

  }

  set

  {

  lock (o)

  {

  _infoStream = value;

  }

  }

  }

 

 

 

 

 

  Sorry, I don't normally deal with locks..

 

 

 

  Thanks for any guidance

 

 

 

  ~P

 

 

  @Prescott,

  Can the volatile fields be wrapped in a lock statement and code that 
access

  those fields with replaced with call to a property /method that wraps 
access

  to that field?

 

 

 

 

  On Wed, Sep 21, 2011 at 1:36 PM, Troy Howard thowar...@gmail.com 
wrote:

 

  I thought it was:

 

  2.9.2 and before are 2.0 compatible

  2.9.4 and before are 3.5 compatible

  After 2.9.4 are 4.0 compatible

 

  Thanks,

  Troy

 

  On Wed, Sep 21, 2011 at 10:15 AM, Michael Herndon

  mhern...@wickedsoftware.net wrote:

  if thats the case, then well need conditional statements for 
including

  ThreadLocalT

 

  On Wed, Sep 21, 2011 at 12:47 PM, Prescott Nasser 
geobmx...@hotmail.com

  wrote:

 

  I thought this was after 2.9.4

 

  Sent

RE: [Lucene.Net] 2.9.4

2011-09-23 Thread casper...@caspershouse.com
NP



From: Prescott Nasser geobmx...@hotmail.com

Sent: Friday, September 23, 2011 9:31 AM

To: lucene-net-dev@lucene.apache.org, casper...@caspershouse.com

Subject: RE: [Lucene.Net] 2.9.4


That helps thanks. No Jira although I will put one in.


Sent from my Windows Phone


-Original Message-

From: casper...@caspershouse.com

Sent: Friday, September 23, 2011 6:05 AM

To: lucene-net-dev@lucene.apache.org

Subject: RE: [Lucene.Net] 2.9.4


Prescott,


You can do one of two things:


- Remove the volatile keyword, but keep the lock statement around the

access to the field


- Remove the lock, and add the volatile keyword to the field


This will allow you to assign to the _infoStream variable (read/write) and

be sure to have the most up-to-date value in the variable, as well as

guarantee atomic reads/writes to that variable.


Your example is incorrect.  The volatile on p only guarantees that

reads/writes will be current if p is changed.  In other words, if you

assign a new person instance to p, you can do so without using a lock

statement and guarantee that the reads/writes from p will be atomic.


However, any calls you make to p are not protected, not because of

volatile.  Volatile will *never* be able to protect calls, it only 
protects

variables.


Lock, on the other hand, can protect calls, assuming that you cover all 
the

calls with the same lock.  You can also group other operations and make

sure that synchronization occurs.


Note that a lock *only* guarantees atomicity/mutual exclusion; when 
applied

to multiple statements, there's no guarantee that you won't corrupt

something.  If an exception is thrown inside of a lock statement (the

second line in three lines of code in the lock block, for example) then 
the

previous values don't roll back or anything.


Because atomicity with a variable assignment is mutually exclusive around

*one* operation, there's no chance of corruption.


Let me know if you want further clarification.


Additionally, if you have a specific patch/issue in JIRA you want me to

look at, let me know, I'll let you know if the solution is right from a

thread-safety point of view.


- Nick





From: Prescott Nasser geobmx...@hotmail.com


Sent: Friday, September 23, 2011 1:17 AM


To: lucene-net-dev@lucene.apache.org


Subject: RE: [Lucene.Net] 2.9.4


I see, so you're essentially saying, I can simply remove the volatile

keyword in this case, and it's exactly the same becuase I am only using it

for read and writes?


So the case I'd need to be more careful of is if an manipulation method is

called on the object itself - suppose:


public person {


_name = Me


public changeName(string n)


{


_name = n;


}


}


If one were to write


public volatile person p = new person();


p.changeName(You);


the call to the method would in this case need a lock (which volatile

gives) to gaurentee that changeName occurs before other items read or

overwrite variable p?


but a straight read or write won't matter:


p = new person();


p = new person():


x = p;


p = new person();


Here, I wouldn't need the volatile keyword becuase those are merely reads

and wrights?





 CC: lucene-net-dev@lucene.apache.org


 From: casper...@caspershouse.com


 Date: Thu, 22 Sep 2011 23:58:42 -0400


 To: lucene-net-dev@lucene.apache.org


 Subject: Re: [Lucene.Net] 2.9.4





 Prescott,





 You really don't need to do that; reads and writes of reference fields

are guaranteed to be atomic as per section 5.5 of the C# Language

Specufication (Atomicity of variable references)





 If you were doing other operations beyond the read and write that you

wanted to be atomic, then the lock statement is appropriate, but in this

case it's not.





 The volatile keyword in this case (assuming no lock) would absolutely be

needed to guarantee that the variable has the most up-to-date value at the

time of access; using lock does this for you and makes volatile

unnecessary.





 - Nick











 On Sep 22, 2011, at 11:14 PM, Prescott Nasser geobmx...@hotmail.com

wrote:





 


  Before I go replacing all the volatile fields I wanted to run this 
past

the list:


 


 


 


  private System.IO.StreamWriter infoStream;


 


 


  into


 


 


 


  private object o = new object();


  private System.IO.StreamWriter _infoStream;


  private System.IO.StreamWriter infoStream


  {


  get


  {


  lock (o)


  {


  return _infoStream;


  }


  }


  set


  {


  lock (o)


  {


  _infoStream = value;


  }


  }


  }


 


 


 


 


 


  Sorry, I don't normally deal with locks..


 


 


 


  Thanks for any guidance


 


 


 


  ~P


 


 


  @Prescott,


  Can the volatile fields be wrapped in a lock statement and code that

access


  those fields with replaced with call to a property /method that wraps

access


  to that field

Re: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts

2011-09-22 Thread casper...@caspershouse.com
, but it shouldn't be 
veiled in that manner. 
 Nor should it be said that I'm not happy to see the changes in the project 
in the last year or that I don't value the project; both could be further 
from the truth, I just don't see (yet) what it takes to bring it to the 
next level, and ultimately, to the level of the Java project (where we 
would have things like Solr, elasticsearch, etc). 
 - Nick

- Michael

Since I'm the one implementing Nuget into the build process and I have not
played with the nuget server or creating a package, it just seem wise to
gather feedback on how people saw themselves using the contrib packages.

I agree, and I agree with Dan Swain's opinion on the matter; have contrib 
as a separate package (with a dependency on core, obviously) and separate 
certain contrib packages out when they are significant enough to stand on 
their own. 
 Additionally, I'd add that you have a Lucene.NET all package, which 
would wrap all of the packages/references up (it's pretty common practice, 
at least among a number of the packages that MS puts out, to have one 
package that has everything, see the Rx framework for an example).

On Wed, Sep 21, 2011 at 9:00 PM, Nicholas Paldino [.NET/C# MVP] 
casper...@caspershouse.com wrote:

 With all due respect, it's myopic opinions like yours and Michael's (his
 leans more towards apathy) which will harm the ability to get the 
project
 into the hands of people.

 I think (hope?) it can be agreed upon that the more that people are 
aware
 of
 Lucene.NET, the better it is for the project in general, and most
 importantly, the more potential that you have that someone will 
*contribute
 back* to it (and given what Lucene.NET has gone through in the past 
year,
 it
 desperately needs that participation).

 The fact of the matter is that Nuget puts packages in the hands of .NET
 developers, that leads to exposure and regardless of personal opinions 
on
 whether or not they *like* Nuget, it can't be denied that it's an
 *extremely* popular way to get libraries into people's projects.

 If you want to quibble over the actual numbers (and the definition of
 extremely popular) then that's fine, but here are the numbers you 
want:

 http://stats.nuget.org/

 If you want to just tell that audience to take a leap, that's fine, but 
I
 think it would be foolish to do so otherwise.

 Additionally, given that Lucene.NET is already on Nuget, isn't there 
*any*
 concern that there isn't an official distro?  Aren't you concerned about
 the
 integrity of the brand that so many of you fought to keep alive over the
 past year?  There's no guarantee that what's on Nuget will be the 
official
 releases/builds that come out of this project, and I'm a little 
surprised
 there isn't more concern over that aspect either.

 Just my $0.02

 - Nick

 -Original Message-
 From: Digy [mailto:digyd...@gmail.com]
 Sent: Wednesday, September 21, 2011 7:06 PM
 To: lucene-net-dev@lucene.apache.org
 Subject: RE: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts

 I am not against it, but personally think it as a toy.
 I am from the generation where people used vi to write codes.

 DIGY

 -Original Message-
 From: Aaron Powell [mailto:m...@aaron-powell.com]
 Sent: Thursday, September 22, 2011 1:56 AM
 To: lucene-net-dev@lucene.apache.org
 Subject: RE: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts

 Any particular reason you guys are not interested in NuGet?

 Aaron Powell
 MVP - Internet Explorer (Development) | FunnelWeb Team Member

 http://apowell.me | http://twitter.com/slace | Skype: aaron.l.powell |
 Github | BitBucket


 -Original Message-
 From: Digy [mailto:digyd...@gmail.com]
 Sent: Thursday, 22 September 2011 7:42 AM
 To: lucene-net-dev@lucene.apache.org
 Subject: RE: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts

 Sorry, but I feel the same as Neal.

 DIGY

 -Original Message-
 From: Granroth, Neal V. [mailto:neal.granr...@thermofisher.com]
 Sent: Wednesday, September 21, 2011 6:08 PM
 To: lucene-net-dev@lucene.apache.org
 Subject: RE: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts

 No interest in Nuget whatsoever.

 - Neal

 -Original Message-
 From: Michael Herndon [mailto:mhern...@wickedsoftware.net]
 Sent: Tuesday, September 20, 2011 10:57 PM
 To: lucene-net-dev@lucene.apache.org; lucene-net-u...@lucene.apache.org
 Subject: [Lucene.Net] Nuget, Lucene.Net, and Your Thoughts

 We're taking a quick poll over the next few days to see how people would
 like use Lucene.Net through Nuget on the developers mailing list**

 Currently version 2.9.2 is hosted on nuget.org, but that package was not
 create by the project maintainers, thus nuget is not currently set up in
 source.  Going forward, we would like to continue what someone else 
started
 by creating nuget packages for Lucene.Net.

 Right now there are two packages: Lucene  Lucene.Contrib.  My question 
to
 the community is do you wish to finer grain packages, i.e. a package for
 each contrib