[jira] [Created] (GIRAPH-159) Case insensitive file/directory name matching will produce errors on M/R jar unpack.

2012-03-19 Thread Brian Femiano (Created) (JIRA)
Case insensitive file/directory name matching will produce errors on M/R jar 
unpack. 
-

 Key: GIRAPH-159
 URL: https://issues.apache.org/jira/browse/GIRAPH-159
 Project: Giraph
  Issue Type: Bug
  Components: build
Affects Versions: 0.2.0
 Environment: OSX 10.6.8
Reporter: Brian Femiano


This only seems to affect platforms where there can be a file/directory naming 
conflicts
from case insensitive matches. 
 
I was able to reproduce running the pseudo-distributed unit tests within OSX.

This has affected other projects: 
https://issues.apache.org/jira/browse/MAHOUT-780

I've been able to reproduce this on my local OSX install with the following 
error:
https://groups.google.com/a/cloudera.org/group/cdh-user/browse_thread/thread/a201218000e956d3/cc6eca3ef9f80ff8

Since LICENSE.txt contains the same content as the file LICENSE, I propose we 
exclude any LICENSE matches found in the unpacked dependency jars
when the maven assembly phase hits 'jar-with-dependencies'. 

I have a patch which moves the 'jar-with-dependencies' descriptor to an 
external compile.xml file which has the proper excludes. This might also
come in handy down the road should any additional tweaks be needed to the 
compile phase. 




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-157) Vertex to perform graph coloring on simple, connected, undirected graphs and related test.

2012-03-19 Thread Eli Reisman (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13232835#comment-13232835
 ] 

Eli Reisman commented on GIRAPH-157:


Hi Sebastian,

Yes I make no claims its in polynomial time or a perfect algorithm. It does 
seem to pass all the test graphs i put it too, but they are small. I would like 
to test it on a larger graph but I need to generate or find one that I have a 
worked out coloring for. Even in the small tests, I had to work out colorings 
to make sure the one it chose was a real choice and I had to confirm on 
wikipedia that the one it found was also a minimal coloring for that graph.

The algorithm just says, everyone starts with no color. Hardcoded (for now) 
SOURCE_VERTEX_ID vertex is lucky guy, chooses FIRST_COLOR and messages others.

From here on we do supersteps until everyone votes to halt. On each superstep, 
messages are passed to a vertex if one of its neighbors changes color. Each 
vertex is keeping track of a reference count of how many neighbors occupy 
which colors. Any round where messages come in, we end up checking to see if 
we need to change color. Any round where a message contains a color conflict 
with us, we oblige by choosing a new color. Choosing a new color always 
consists of looking for the lowest unoccupied color the neighbors aren't 
currently occupying. If we end up choosing a new color and it doesn't end up 
beign the color we already are, then we always message all neighbors regarding 
our old color (to remove a ref count) and our new one (to add a ref count and 
recalc their color if need be).

The trick is when you get conflicting color messages directly with your own 
color. As far as I could figure, this should only happen when you and one or 
more of your neighbors chose the same color at the same time (the last 
superstep) and in this case, I let the one with the lowest vertexId break the 
tie and keep the color. This is arbitrary, but seems to work as these ties must 
be broken and since everyone tries to choose a new color targeting the lowest 
color available according to their own reference counts, these conflicts tend 
to settle out correctly in the end (so far!). One possible optimization (or 
not?) involves letting the conflict count (a count of how many of the current 
neighbors I am in simultaneous color conflict with who have a higher (weaker) 
vertexID than the current vertex) help me guess at my next color choice. this 
should reduce the number of supersteps required to resolve a worst-case N-way 
conflict (which i believe is currently about N supersteps).

I am speaking of course of an isolated conflict. I do have the sneaking 
suspicion that given a graph that is still simple, undirected, and connected 
(thats all i made promises for!) of sufficient size and complexity, it might be 
possible to tangle this thing up in some feedback loop somehow? Further, I 
would suspect that this ends up being some sort of heuristic close-guess at a 
minimal color in some complex situations too. 

So, I'm sure its not super fast and I'm not at all sure it works at scale, but 
it does seem fairly simple and it seems to work so far. Does someone know some 
slicker algorithms that we can convert to a think like a vertex paradigm? To 
be clear, this algorithm only colors vertices too -- I don't know how to think 
like an edge yet at all!

Thanks, I welcome the feedback

Eli


 Vertex to perform graph coloring on simple, connected, undirected graphs and 
 related test.
 --

 Key: GIRAPH-157
 URL: https://issues.apache.org/jira/browse/GIRAPH-157
 Project: Giraph
  Issue Type: Test
  Components: examples, test
Affects Versions: 0.2.0
Reporter: Eli Reisman
Assignee: Eli Reisman
Priority: Trivial
  Labels: newbie
 Attachments: GIRAPH-157.patch


 Hi. I am attempting to learn the Hadoop and Giraph codebases and wanted to 
 write a simple client application for Giraph to help me learn the ins and 
 outs of it. This is a simple unit test and vertex modeled after the 
 ConnectedComponentsVertex and related test. The vertex test runs whenever you 
 run the mvn test or mvn verify suite of tests. When finished processing, 
 each vertex will have an integer value that is its color.
 This is a pretty simple implementation, and although I have tested it on a 
 number of small graphs of varied trickiness and it seems to rapidly arrive at 
 a minimal coloring, its hard (for me at least) to guess which possible 
 coloring it will arrive at and I have no idea how it will do on really big 
 graphs yet without finding some more pre-colored larger test graphs to try it 
 on. Ideas anyone?
 Anyway, it was fun to put this together, and I'd 

[jira] [Updated] (GIRAPH-159) Case insensitive file/directory name matching will produce errors on M/R jar unpack.

2012-03-19 Thread Brian Femiano (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Femiano updated GIRAPH-159:
-

Attachment: GIRAPH-159.patch

 Case insensitive file/directory name matching will produce errors on M/R jar 
 unpack. 
 -

 Key: GIRAPH-159
 URL: https://issues.apache.org/jira/browse/GIRAPH-159
 Project: Giraph
  Issue Type: Bug
  Components: build
Affects Versions: 0.2.0
 Environment: OSX 10.6.8
Reporter: Brian Femiano
Priority: Minor
 Attachments: GIRAPH-159.patch


 This only seems to affect platforms where there can be a file/directory 
 naming conflicts
 from case insensitive matches. 
  
 I was able to reproduce running the pseudo-distributed unit tests within OSX.
 This has affected other projects: 
 https://issues.apache.org/jira/browse/MAHOUT-780
 I've been able to reproduce this on my local OSX install with the following 
 error:
 https://groups.google.com/a/cloudera.org/group/cdh-user/browse_thread/thread/a201218000e956d3/cc6eca3ef9f80ff8
 Since LICENSE.txt contains the same content as the file LICENSE, I propose we 
 exclude any LICENSE matches found in the unpacked dependency jars
 when the maven assembly phase hits 'jar-with-dependencies'. 
 I have a patch which moves the 'jar-with-dependencies' descriptor to an 
 external compile.xml file which has the proper excludes. This might also
 come in handy down the road should any additional tweaks be needed to the 
 compile phase. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-159) Case insensitive file/directory name matching will produce errors on M/R jar unpack.

2012-03-19 Thread Brian Femiano (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Femiano updated GIRAPH-159:
-

Attachment: (was: GIRAPH-159.patch)

 Case insensitive file/directory name matching will produce errors on M/R jar 
 unpack. 
 -

 Key: GIRAPH-159
 URL: https://issues.apache.org/jira/browse/GIRAPH-159
 Project: Giraph
  Issue Type: Bug
  Components: build
Affects Versions: 0.2.0
 Environment: OSX 10.6.8
Reporter: Brian Femiano
Priority: Minor

 This only seems to affect platforms where there can be a file/directory 
 naming conflicts
 from case insensitive matches. 
  
 I was able to reproduce running the pseudo-distributed unit tests within OSX.
 This has affected other projects: 
 https://issues.apache.org/jira/browse/MAHOUT-780
 I've been able to reproduce this on my local OSX install with the following 
 error:
 https://groups.google.com/a/cloudera.org/group/cdh-user/browse_thread/thread/a201218000e956d3/cc6eca3ef9f80ff8
 Since LICENSE.txt contains the same content as the file LICENSE, I propose we 
 exclude any LICENSE matches found in the unpacked dependency jars
 when the maven assembly phase hits 'jar-with-dependencies'. 
 I have a patch which moves the 'jar-with-dependencies' descriptor to an 
 external compile.xml file which has the proper excludes. This might also
 come in handy down the road should any additional tweaks be needed to the 
 compile phase. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-159) Case insensitive file/directory name matching will produce errors on M/R jar unpack.

2012-03-19 Thread Brian Femiano (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Femiano updated GIRAPH-159:
-

Attachment: compile.xml
GIRAPH-159.patch

 Case insensitive file/directory name matching will produce errors on M/R jar 
 unpack. 
 -

 Key: GIRAPH-159
 URL: https://issues.apache.org/jira/browse/GIRAPH-159
 Project: Giraph
  Issue Type: Bug
  Components: build
Affects Versions: 0.2.0
 Environment: OSX 10.6.8
Reporter: Brian Femiano
Priority: Minor
 Attachments: GIRAPH-159.patch, compile.xml


 This only seems to affect platforms where there can be a file/directory 
 naming conflicts
 from case insensitive matches. 
  
 I was able to reproduce running the pseudo-distributed unit tests within OSX.
 This has affected other projects: 
 https://issues.apache.org/jira/browse/MAHOUT-780
 I've been able to reproduce this on my local OSX install with the following 
 error:
 https://groups.google.com/a/cloudera.org/group/cdh-user/browse_thread/thread/a201218000e956d3/cc6eca3ef9f80ff8
 Since LICENSE.txt contains the same content as the file LICENSE, I propose we 
 exclude any LICENSE matches found in the unpacked dependency jars
 when the maven assembly phase hits 'jar-with-dependencies'. 
 I have a patch which moves the 'jar-with-dependencies' descriptor to an 
 external compile.xml file which has the proper excludes. This might also
 come in handy down the road should any additional tweaks be needed to the 
 compile phase. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira