

[silviuc] first commit

[silviuc] Initial push.

[silviuc] Several refactorings in preparation for making the repo public.

[silviuc] Small fixes in BigQuery snippets and wordcount example.

[silviuc] Python Dataflow fit-n-finish.

[silviuc] README: add explicit Table of Contents.

[silviuc] Some more fixes related to argument passing. ------------- Created by

[silviuc] Readme: add a missing section anchor, close all anchors. -------------

[silviuc] "README" edit from Robert: [] ------------- Created by MOE:

[silviuc] Code snippets for Web doc on PipelineOptions. ------------- Created by

[silviuc] Depend on google-apitools-dataflow-v1b3 >= 0.4.20160217

[silviuc] Performs several updates to doc snippents for PipelineOptions.

[silviuc] Validate pipeline options at the time of pipeline creation.

[silviuc] Simplify whitelist warning to show warning before every run.

[silviuc] Support combiner lifting, update batch job major version to 4

[silviuc] Adding required options to the remote execution snippet

[silviuc] Add protobuf as dependency to address namespace sharing issue

[silviuc] Improve pickling robustness

[silviuc] Update dill to version 0.2.5

[silviuc] Support for staging SDK tarball downloaded from github

[silviuc] Initialize worker logging earlier

[silviuc] Display a warning when pipeline option runner is not specified

[silviuc] Apply format string to log message only when there are args

[silviuc] Change "is_streaming" pipeline option to "streaming"

[silviuc] Many pickling fixes.

[silviuc] Improve the str() output of various Dataflow classes

[silviuc] Disallow (broken) pickling of generators.

[silviuc] Accept arbitrary objects as first input to the logger

[silviuc] Support timer-based triggers, watermark holds in streaming jobs

[silviuc] Display a warning only when pipeline option runner is not specified

[silviuc] Accept GCS paths as extra packages

[silviuc] Snippets for type hint docs.

[silviuc] Adds a cookbook example to illustrate the usage of side inputs.

[silviuc] Provide __str__() for ShuffleEntry and ShuffleKeyValuesIterable

[silviuc] Remove byte counters, pending better size estimates

[silviuc] Remove no longer needed protobuf dependency

[silviuc] Internal changes for maintaining documentation

[silviuc] Improve labeling of transforms.

[silviuc] Add worker_harness_container_image option

[silviuc] Initialize an empty executed_operations on MapTask

[silviuc] Internal documentation updates for GitHub

[silviuc] Change whitelist warning URL to the signup form

[silviuc] Updates bigquery source/sink to use executing project by default.

[silviuc] Remove windmill host and port defaults from streaming worker

[silviuc] Performs two small updates to progress reporting.

[silviuc] Allow the sdk_location to point to a tarball as well as a directory

[silviuc] Eliminate the fallback to dill when pickling fails for data

[silviuc] Implement continuous combining in pre-shuffle combining table.

[silviuc] Use the singleton pattern in logger to only create one object.

[silviuc] Updates to internal documentation for package installation

[silviuc] Updates MapTask._parse_avro_source() so that start and end position of

[silviuc] Optimize shuffle writing

[silviuc] Cythonize runners.common, worker.executor, and utils.counters

[silviuc] Add message with Dataflow monitoring URL for the submitted job

[silviuc] Marking WorkItems DONE before releasing the lock.

[silviuc] Run batchworker do worker under cProfile

[silviuc] Logging pipeline messages snippet

[silviuc] Further optimizations in

[silviuc] Optimize logging context

[silviuc] Forbid use of PubSub I/O in batch and local jobs

[silviuc] WordCount, minimal WordCount, and debugging WordCount snippets

[silviuc] Monitoring interface snippets.

[silviuc] Change option names to: --worker_machine_type and --worker_disk_type

[silviuc] After a job fails, wait for any error messages to show up

[silviuc] Handle two versions of oauth2client (>=2.0.0 and 1.5.2)

[silviuc] Disable VarIntCoder for long values

[silviuc] Remove some deprecated names.

[silviuc] Update the short link for the Alpha signup form

[silviuc] Fixed the example docs to use the correct name for staging_location

[silviuc] Bump up version to 0.2.1

[silviuc] Reformat some doc strings to be acceptable to Pydocs Sphinx

[silviuc] Update some str methods for recent SDK representation changes

[silviuc] Store timestamps and time intervals with microsecond granularity

[silviuc] Better error messaging on missing gcloud

[silviuc] Explicitly set all required pipeline options in the minimal example

[silviuc] New class ReceiverSet in the worker

[silviuc] Avoid calling logging.debug on every element

[silviuc] Don't use KV coder for ungrouped shuffle reads/writes

[silviuc] Rename OutputTimeFn.OUTPUT_AT_MAX to OUTPUT_AT_LATEST for clarity

[silviuc] Clearer error message when SDK file cannot be found to stage

[silviuc] Add support for deduplicating id_label in PubSubSource

[silviuc] Several fixes related to schema specified when creating a BigQuery 

[silviuc] Add TimestampCoder for timeutil.Timestamp objects

[silviuc] README: fix pip install link

[silviuc] Cythonize Timestamp- and FloatCoder

[silviuc] Make sdk pipeline options available in the DoFn context

[silviuc] Report work item exceptions in the streaming worker

[silviuc] Add plumbing to pass a coder to the byte-size counter updater

[silviuc] Remove sdk pipeline options from the DoFn context

[silviuc] Add reference counting for consumers of AppliedPTransform outputs

[silviuc] Support pagination for large list states in Windmill

[silviuc] Implement and use WindowedValue.with_value

[silviuc] Bump up version to 0.2.2 ----Release Notes---- [] ------------- 

[silviuc] Cythonize Timestamp- and FloatCoder

[silviuc] Make sdk pipeline options available in the DoFn context

[silviuc] Remove sdk pipeline options from the DoFn context

[silviuc] Use a CounterFactory to create counters

[silviuc] Set zip_safe=False in

[silviuc] Updates and simplifies logic related to progress reporting.

[silviuc] Clean up PValue and PCollection with clearer argument passing

[silviuc] Updates BatchWorker to report failure to shutdown progress reporter to

[silviuc] Internal testing change.

[silviuc] Declare namespace packages in

[silviuc] Add logging for memory footprint debugging in direct runner

[silviuc] Add class Accumulator

[silviuc] Add class ObservableMixin

[silviuc] Fixes a bug in progress reporting in TextFileReader

[silviuc] Use reraise_augmented for start/finish operations

[silviuc] Make element iterators observable

[silviuc] Add class ByteCountingOutputStream

[silviuc] Break up OperationCounters.update() into before and after pieces

[silviuc] Implement aggregated_values for DirectPipelineRunner

[silviuc] Allow operations to override the coder passed to update_counters

[silviuc] Remove version pins for google-apitools and oauth2client packages

[silviuc] Renames Source/Reader classes for native sources/readers.

[silviuc] Treat creation of side input views as a PTransform

[silviuc] Remove perf regression in not yet finished size estimation code

[silviuc] Bump up version to 0.2.3

[silviuc] Adding BYTES to the possible data type options for 

[silviuc] Pipeline and runner cleanup

[silviuc] Use WindowFn-specified Coders for the Windows

[silviuc] Return PDone as the result of a NativeWrite

[silviuc] Adds dynamic work rebalancing support for InMemoryReader.

[silviuc] Consistently apply sharding suffix to TextFileSink

[silviuc] Don't even try to run Cython on Windows

[silviuc] Modify --requirements_file behavior to cache locally packages

[silviuc] Add check for SDK versus container language/version compatibility

[silviuc] Fix incorrectly cached values in pvalue.AsList

[silviuc] Enable support for all supported counter types

[silviuc] Rolling back due to an internal test failure.

[silviuc] Support large iterable side inputs

[silviuc] Implement non-native TextFile Sink

[silviuc] Improve FileSink's documentation.

[silviuc] Update equal_to matcher with clearer error message

[silviuc] Fix issue in cache trimming logic for combiner lifting

[silviuc] Rename GlobalWindows.WindowedValue to GlobalWindows.windowed_value

[silviuc] At 0 progress don't pretend to be 1 byte done.

[silviuc] Add a warning if trying to run on anything but Python 2.7

[silviuc] Generalize base PTransform._extract_input_pvalues

[silviuc] Bump up version to 0.2.4

[silviuc] Implement EagerPipelineRunner, useful for running in a repl.

[silviuc] Enable gzip compression on text files sink.

[silviuc] Create separate worker version file

[silviuc] Add utility function to check compression type validity.

[silviuc] Use worker harness container corresponding to SDK version

[silviuc] Is_composite to return True instead of parts when there are parts

[silviuc] Adds the base API for creating new sources.

[silviuc] Dynamic work rebalancing support for InMemory reader.

[silviuc] Skip modules without a __name__ attribute

[silviuc] New method OperationCounters.should_sample

[silviuc] Adds support for reading custom sources using DataflowPipelineRunner.

[silviuc] Ignore undeclared side outputs of DoFns in cloud executor

[silviuc] Remove separate worker version file

[silviuc] Internal rollback.

[silviuc] Undo introduction of OperationCounters.should_sample

[silviuc] Use shelve as a disk backed dictionary optionally in PValueCache

[silviuc] Update filehandling utilities

[silviuc] Bump up version to 0.2.5

[silviuc] Remove separate worker version file

[silviuc] Internal changes for documentation validation

[silviuc] Fix module dict pickling.

[silviuc] Make retry logic idempotent in GcsIO.delete and GcsIO.rename

[silviuc] Fix buffer overruns in fast OutputStream implementaion

[silviuc] Introduce OperationCouters.should_sample

[silviuc] Allow Pipeline objects to be used in Python with statements

[silviuc] Undo introduction of OperationCounters.should_sample

[silviuc] Augment file utils with recursive copy

[silviuc] Add autoscaling pipeline options

[silviuc] Bump up version to 0.2.6

[silviuc] Reintroduce OperationCounters.should_sample

[silviuc] Fix is_service_runner to detect endpoints ending with /

[silviuc] Implement fixed sharding in Text sink.

[silviuc] Remove unused GcsIO class attribute

[silviuc] Raise an IOError when source file in GcsIO.copy does not exist

[silviuc] Use multiple file rename threads in finalize_write

[silviuc] Retry idempotent I/O operations on GCS timeout

[silviuc] Bump up version to 0.2.7

[silviuc] Remove worker code

[silviuc] Move all files to apache_beam folder

[silviuc] Rewrite imports and usage to apache_beam

[silviuc] Remove google folder

[silviuc] Update and files for Apache Beam

[dhalperi] Fix the licenses (add or update)

[davor] Clean up usage of temp directories in _stage_extra_packages

[davor] Use a DirectExecutor for Watermark Callbacks

[davor] Execute NeedsRunner tests in the Direct Runner

[davor] [BEAM-334] DataflowPipelineRunner: bump environment major version

[davor] [flink] fix potential NPE in ParDoWrapper

[davor] Port cleanupDaemonThreads fix to archetype module

[davor] Add success/failure counters to new PAssert mechanism

[davor] Changed Word Counts to use TypeDescriptors.

[davor] Updated complete examples to use TypeDescriptors.

[davor] [BEAM-336] update examples-java README

[davor] Make example AddTimestampFn range deterministic

[davor] Add configuration for Dataflow runner System.out/err

[davor] Fix AutoComplete example streaming configuration

[davor] CompressedSourceTest: simplify

[davor] Revert GBK-based PAssert

[davor] Update Pipeline Execution Style in WindowedWordCountTest

[davor] Update Direct Module tests to explicitly set Pipeline

[davor] Use TestPipeline#testingPipelineOptions in IO Tests

[davor] Move GcsUtil TextIO Tests to TextIOTest

[davor] Set Runner in DataflowRunner Tests

[davor] Increase Visibility of Flink Test PipelineOptions

[davor] Update the Default Pipeline Runner

[davor] Use TimestampedValue in DoFnTester

[davor] Add DoFnTester#peekOutputValuesInWindow

[davor] Rename DoFnTester#processBatch to processBundle

[davor] Explicitly set the Runner in TestFlinkPipelineRunner

[davor] Package javadoc for org.apache.beam.sdk.transforms.display

[davor] Fix NullPointerException in AfterWatermark display data

[davor] Run NeedsRunner tests in Runner Core on the DirectRunner

[davor] Reuse UnboundedReaders in the InProcessRunner

[davor] Modified range tracker to use first response seen as start key

[davor] Remove DoFnRunner from GroupAlsoByWindowsProperties

[davor] Remove the DirectPipelineRunner from the Core SDK

[davor] Update DataflowPipelineRunner worker container version

[davor] Rename InProcessPipelineRunner to DirectRunner

[davor] Remove InProcess Prefixes

[davor] Fix type error in Eclipse

[davor] Improve BigQueryIO validation for streaming WriteDisposition

[davor] [Spark] Elide assigning windows when WindowFn is null

[davor] Roll-forwards: Base PAssert on GBK instead of side inputs

[davor] Replace GcsPath by IOChannelFactory in WordCount.

[davor] Add test for ReduceFnRunner GC time overflow

[davor] Fix overflow in ReduceFnRunner garbage collection times

[davor] Added BigDecimal coder and tests.

[davor] Add BigIntegerCoder and tests

[davor] Touch up BigDecimalCoder and tests

[davor] [BEAM-342] Implement Filter#greaterThan,etc with Filter#byPredicate

[davor] CrashingRunner: cleanup some code

[davor] Remove the beam.examples dependency from flink.

[davor] Remove last vestige of the words DirectPipeline

[davor] Remove references to javax.servlet.

[davor] Finish removing DirectPipelineRunner references

[davor] Rename DataflowPipelineRunner to DataflowRunner

[davor] Turn on failOnWarning and ignore unused runners modules in example.

[davor] [BEAM-321] Fix Flink Comparators

[davor] Remove Pipeline from TestDataflowPipelineRunner

[davor] Configure RunnableOnService tests for Spark runner, batch mode

[davor] DataflowPipelineJob: Retry messages, metrics, and status polls

[davor] Rename FlinkPipelineRunner to FlinkRunner

[altay] Use item equality in apply_to_list test

[dhalperi] Implements a framework for developing sources for new file types.

[davor] Travis config for python tests

[github] correct pip install target to include 'archive/'

[dhalperi] Update juliaset example to support "pip install"

[dhalperi] Pylint integration for Python SDK

[altay] Remove the whitelisting required warning

[dhalperi] Enables more linting rules.

[altay] Disable Java specific Travis tests for python-sdk branch

[silviuc] Remove internal test config file

[silviuc] Fix gcsio.exists call

[altay] Fix expression-not-assigned and unused-variable lint warnings.

[altay] Enables unused-import and used-before-assignment rules

[robertwb] Get current SDK package from PyPI instead of GitHub

[robertwb] Define GOOGLE_PACKAGE_NAME and use it everywhere

[robertwb] Replace call() with check_call()

[dhalperi] Move Jenkins Python post commit script to the repository.

[altay] Set end value in and remove pylint disable statement.

[silviuc] Use the beamhead label for containers

[dhalperi] Enable linter rules no-self-argument, reimported, ungrouped-imports

[robertwb] pipeline.options should never be None

[robertwb] Cleanup dataflow_test.

[robertwb] Remove unneeded label argument in ptransform_fn

[robertwb] Better error message for poor use of callable apply

[robertwb] Add support for ZLIB and DEFLATE compression

[altay] Uncomment tox in the postcommit script.

[dhalperi] Making the dataflow temp_location argument optional

[dhalperi] Update Python aggregator example to match Java usage

[dhalperi] Allow ".tar" files in extra_packages

[robertwb] Remove ptransform tests from the excluded tests list

[robertwb] Internal cleanup.

[dhalperi] Fix warnings that came with newly released Pylint (1.6.1) version.

[chamikara] Adds more code snippets.

[robertwb] Pickle only used symbols from __main__ namespace.

[robertwb] Fix lint error.

[dhalperi] Remove more tests from nose tests exclusion list

[robertwb] Fixes bug due to accessing cached pvalues multiple times.

[robertwb] Made checksum_output optional in

[robertwb] Add type hints to bigshuffle to avoid pickle overhead.

[robertwb] Fix typo in Dataflow runner monitoring message

[robertwb] Added to the list of Cythonized Python SDK files.

[robertwb] Temporarily reverting pickler changes (@4e2d8ab).

[robertwb] DoOutputsTuple cleanup

[robertwb] Accept runners by fully qualified name.

[robertwb] Cleanup known runners code.

[robertwb] Fix min and max timestamp on 32-bit machines

[robertwb] Update some of the example tests to use assert_that

[dhalperi] Handle HttpError in GCS upload thread

[robertwb] Fixes several issues related to 'filebasedsource'.

[ccy] Make step encodings consistently use WindowedValueCoders

[dhalperi] Python in process runner with bundled execution

[dhalperi] Clarifies that 'TextFileSource' only supports UTF-8 and 

[dhalperi] Start/finish bundle methods do not take extra args anymore

[robertwb] Remove pipeline.apply(pvalue, callable)

[robertwb] Fixing broken example tests

[robertwb] Implement coder optimized for coding primitives.

[robertwb] Used fast primitives coder as fallback coder.

[robertwb] Add fast support for dicts for default coder.

[robertwb] Add >> operator for labeling PTransforms.

[robertwb] Fix comment.

[dhalperi] Adds a test harnesses and utilities framework for sources.

[dhalperi] Check type of coder for step feeding into GroupByKey in Dataflow 

[dhalperi] Adds a PTransform for Avro source.

[robertwb] Remove "beamhead-02" container workaround

[robertwb] Add size-estimation support to Python SDK Coders

[robertwb] Address Robert's comments.

[robertwb] Cythonize WindowedValue class.

[robertwb] Add Cython DoFnContext and Receiver stubs.

[robertwb] Reduce the number of elements in the pvalue caching test.

[robertwb] Clarifying comments.

[robertwb] Remove expensive per-element-step logging context.

[robertwb] Cache dofn.proces method.

[robertwb] Restore (faster) logging context.

[robertwb] Minor cdef value changes.

[robertwb] Add tests for WindowedValue.

[altay] Log all exceptions in _start_upload

[silviuc] Refactor to separate strings/versions

[silviuc] Make save_main_session optional

[silviuc] Refactor examples to use save_main_session

[robertwb] Fix multi-input named PTransforms.

[robertwb] Move names out of transform constructors.

[robertwb] Fix error messages for externally named PTransforms.

[robertwb] Cleanup and fix combiners_test.

[robertwb] fix pipeline test

[robertwb] Fixes examples

[robertwb] Fix label-sensitive test.

[robertwb] Lint fixes.

[robertwb] fixup: failing tests expecting name

[robertwb] Final cleanup pass.

[robertwb] Make DoFnRunner a Receiver.

[robertwb] Receiver and LoggingContext adapters.

[robertwb] Allow passing logging context directly.

[ccy] Fix SDK name and version sent to the Cloud Dataflow service

[ccy] Update docstring.

[robertwb] Make TextFileReader observable

[robertwb] Better top implementation.

[robertwb] Allow Top operations to take key argument rather than compare.

[robertwb] Optimize Map and Flatmap when there are no side inputs.

[mariand] Increased the GCS buffer size from 1MB to 8MB and introduced a 128kB

[altay] Fix typo in combiners test.

[chamikara] Deletes some code that is not used by SDK.

[chamikara] Updates json to/from Python object  conversion to properly handle 

[robertwb] Implement add_input for all CombineFns.

[robertwb] Document TupleCombineFns

[chamikara] Fixes GcsIO.exists() to properly handle files that do not exist.

[chamikara] fixup! updates an error message and adds a test for error path.

[dhalperi] Use the cythonized DoFnContext everywhere.

[klk] fixed typo in

[dhalperi] Fix hashing and comparison for compression types

[dhalperi] [BEAM-378] integrate setuptools in Maven build

[altay] Revert the changes to the accidentaly reverted files

[dhalperi] Improve error handling in

[altay] Fixing the juliaset example

[dhalperi] Remove egg_info from setup.cfg

[fy] Fix typo in comment

[dhalperi] Move native TextFileWriter to use GcsIO for writing

[dhalperi] Allow Google Cloud Dataflow workflows to use ".dev" workers

[dhalperi] Updates FileBasedSource so that sub-class can prevent splitting to 

[chamikara] Updates SourceTestBase concurrent splitting test to share thread 

[dhalperi] Update Python examples

[dhalperi] Fixing the custom ptransform snippet in the comments

[dhalperi] Add support for reading compressed files.

[dhalperi] Support mode attribute on GCS files

[dhalperi] Update 404 link to setuptools docs

[ccy] Use cStringIO instead of StringIO

[dhalperi] Making Dataflow Python Materialized PCollection representation more

[dhalperi] Refactoring code in to allow for re-use.

[ccy] Insert a shuffle before write finalization

[github] Allow pickling of UnwindowedValues instances

[robertwb] Add unit test for unwindowed iterator picking.

[robertwb] Adds a text source to Python SDK.

[robertwb] Removed unnecessary throttling of rename parallelism.

[robertwb] Use sys.executable and "-m pip" to ensure we use the same Python and 

[robertwb] Changed ToStringCoder to BytesCoder in test

[robertwb] Updates lint configurations to ignore generated files.

[robertwb] Updates Dataflow API client.

[robertwb] Adds support for specifying a custom service account.

[github] Insert global windowing before write results GBK

[dhalperi] Set allow_nan=False on bigquery JSON encoding

[chamikara] Adds __all__ tags to source modules.

[robertwb] Better documentation for CompressionTypes.

[robertwb] Using strings instead of integers for identifying CompressionTypes.

[robertwb] Minor cleanups in docstrings and error messages.

[robertwb] Implement liquid sharding for concat source.

[robertwb] Move ConcatSource to iobase.

[robertwb] Allow ConcatSource to take SourceBundles rather than raw Sources

[robertwb] Move ConcatSource into its own module.

[robertwb] Add BEAM_PYTHON environment override to set the python executable

[robertwb] Ignore virtualenv environment in git

[robertwb] Cleanup temporary files in textio and avroio tests.

[robertwb] Fix and add test for ReadFromAvro transform.

[robertwb] Implement avro sink.

[robertwb] Fix python bin test.

[robertwb] Allow .whl files to be staged with --extra_package

[robertwb] Compress serialized function data.

[robertwb] Add annotation to mark deprecated or experimental APIs via 

[robertwb] Use keyword arguments in fnc calls

[chamikara] Updates filebasedsource to support CompressionType.AUTO.

[robertwb] Add equality methods to range source.

[robertwb] Fixes a bug in on Windows.

[mariagh] Use absolute path for import

[robertwb] Making sure that GcsBufferedReader implements the iterator protocol

[robertwb] Fixes issue with Travis CI and Mac images.

[robertwb] Move dataflow native sinks and sources into dataflow directory.

[robertwb] Avoid circular imports.

[robertwb] Move explicit references to _NativeWrite.

[robertwb] Remove direct references to iobase.Native*

[robertwb] Import Native* in iobase for backwards compatibility.

[altay] Add license to init files.

[robertwb] Add profile_memory flag

[robertwb] Add support for bz2 compression

[robertwb] implement codreview feedback

[robertwb] Post-merge fixup.

[altay] change required version for oauth2client

[robertwb] Fixed pip requirement.

[robertwb] Added cython version check

[robertwb] Enhancements and deprecation cleanups related to file compression.

[robertwb] A few test fixes and other cleanups.

[robertwb] Dissallow (unimplemented) windowed side inputs.

[robertwb] Fix tests unnecessarily using windowed side inputs

[robertwb] Add support for experiments

[robertwb] Document that source objects should not be mutated.

[robertwb] Adds an assertion to source_test_utils for testing reentrancy.

[robertwb] Windowed side input test.

[robertwb] Implement windowed side inputs for direct runner.

[robertwb] Fix tests expecting list from AsIter.

[robertwb] Implement windowed side inputs for InProcess runner.

[robertwb] More complicated window tests.

[robertwb] Optimize globally windowed side input case

[robertwb] Minor fixups for better testing

[robertwb] Rename from_iterable to avoid confusion.

[robertwb] Close threadpools when finished with them

[robertwb] Better error for missing job name

[mariagh] Add alias (--extra_packages) for --extra_package

[fjp] Update README to reflect Dataflow to Apache Beam migration

[robertwb] Pin the version of dependencies

[robertwb] Limit version ranges for dependencies in the 0.* version range

[vikasrk] update parent pom version in python/pom.xml

[robertwb] Laying down infrastructure for static display data

[robertwb] Adding license text to all files. Fixing one lint issue. Refactoring

[robertwb] Moving files. Using DisplayDataItem to enable dictionaries to be 
used as

[robertwb] Adding documentation. Setting Python classes as STRING types.

[robertwb] Addressing comments

[robertwb] Removing superfluous TODO after unittest is passing

[robertwb] Compare display data items, not dicts.

[robertwb] Improving error of missing sdk_location

[robertwb] Pushing check down. Rewritting error message

[robertwb] Adds IterableCoder, fast coding for sets, booleans.

[robertwb] Several fixes to coder size estimates.

[robertwb] Implement size observation for FastPrimitivesCoderImpl.

[robertwb] Move Timestamp and related classes into apache_beam/utils.

[robertwb] Remove fake Timestamp and WindowedValue hacks.

[robertwb] Improvements related to size estimation.

[robertwb] Fix a couple more coder vs. element-coder changes for element sizing.

[chamikara] Fixes two bugs in avroio_test 'test_corrupted_file'.

[robertwb] Updated readme according to BEAM-693

[robertwb] BEAM-873 Support for BigQuery 2 SQL

[robertwb] Add unittests for BQ 2.0

[robertwb] Checking for integer types in json conversion

[robertwb] Adding unit tests

[robertwb] Make the BQ input for dataflow runner and local runner is identical 

[robertwb] Document the input and output of type conversions

[robertwb] Add support for proto coder

[robertwb] DeterministicCoder rename

[robertwb] Fix cythonization

[robertwb] Don't default to PickleCoder for sources.

[robertwb] Generic ordered position range tracker.

[robertwb] Implement key range tracker.

[robertwb] Renames InprocessPipelineRunner to DirectPipelineRunner and removes 

[robertwb] Add a test to check memory consumption of the direct runner.

[robertwb] Add hamcrest dependency.

[robertwb] Adding display data to sink, sources, and parallel-do operations.

[robertwb] Optimize WindowedValueCoder

[robertwb] Remove tox cache from previous workspace

[robertwb] DirectPipelineRunner bug fixes.

[altay] Remove the inline from WindowedValue.create()

[robertwb] [BEAM-852] Add validation to file based sources during create time

[robertwb] Use batch GCS operations during FileSink write finalization

[robertwb] Add IP configuration to Python SDK

[robertwb] Allow for passing format so that we can migrate to BQ Avro export 

[robertwb] Add a couple of missing coder tests.

[robertwb] Also check coder determinism.

[robertwb] Display Data for: PipelineOptions, combiners, more sources

[robertwb] Fix merge lint error

[robertwb] Query Splitter for Datastore v1

[vikasrk] Upgrade Datastore version

[robertwb] Fix shared state across retry decorated functions

[altay] Fix the flaky test_model_multiple_pcollections_partition test

[robertwb] Fixes a couple of issues of FileBasedSource.

[robertwb] Remove redundant REQUIRED_PACKAGES

[robertwb] Fix issue where batch GCS renames were not issued

[robertwb] Improve GcsIO throughput by 10x

[lcwik] A few improvements to Apache Beam Python's FileIO.

[lcwik] Handling the 'collision' case for UIDs and also augmenting comments.

[lcwik] Fixing lynt warnings related to indentation.

[davor] Add missing fields to the retry decorator

[altay] Make create() avaialable to pure python callers

[bchambers] Fixing error with PipelineOptions DisplayData of lists

[altay] fixing reviewer comments

[tgroh] Support @ValidatesRunner(RunnableOnService) in Python [1/2]

[mariagh] Remove tests for merge

[davor] Add DatastoreIO to Python SDK

[davor] Update StarterPipeline

[davor] Add JUnit category for stateful ParDo tests

[davor] Reject stateful DoFn in SparkRunner

[davor] Reject stateful DoFn in FlinkRunner

[davor] Reject stateful DoFn in ApexRunner

[davor] Simplify the API for managing MetricsEnvironment

[davor] Output Keyed Bundles in GroupAlsoByWindowEvaluator

[davor] Add TransformHierarchyTest

[davor] Use more natural class to find class loader in ReflectHelpers

[davor] Update transitive dependencies for Apex 3.5.0 snapshot version.

[dhalperi] datastoreio write/delete ptransform

[davor] Improve the speed of getting file sizes

[dhalperi] Update googledatastore version

[tgroh] Support ValidatesRunner Attribute in Python

[vikasrk] Few datastoreio fixes

[robertwb] Parse table schema from JSON

[robertwb] Improve size estimation speed for file samples

[robertwb] Add snippet for standard sql

[robertwb] auth: add application default credentials as fallback

[robertwb] Add snippet for datastoreio

[sourabhbajaj] Do not need to list all files in GCS for validation. Add limit 
field to

[robertwb] Fix auth related unit test failures

[robertwb] Make the legacy SQL flag consistent between Java and Python

[robertwb] Add labels to lambdas in write finalization

[robertwb] Call from_p12_keyfile() with the correct arguments.

[robertwb] Removing a bug in .travis.yml that makes the build fail.

[robertwb] Modify create_job to allow staging the job and not submitting it to 

[vikasrk] Add experimental warning to datastoreio

[altay] Add missing job parameter to the submit_job_description.

[sourabhbajaj] Change export format to AVRO for BQ

[sourabhbajaj] Rollback the default format to json

[robertwb] Move template_runners_test to runners folder.

[robertwb] Fix the pickle issue with the inconsistency of dill load and dump

[bchambers] Display data keys in Python should be snake_case

[robertwb] [BEAM-1077] @ValidatesRunner Test in Python Postcommit

[ccy] Fix template_runner_test on Windows

[robertwb] Add reference to the >> and | operators for pipelines.

[robertwb] Handle empty batches in GcsIO batch methods

[robertwb] Fix a typo in query split error handling

[robertwb] [BEAM-1109] Fix Python Postcommit Test Timeout

[robertwb] Add more documentation to datastore_wordcount example

[robertwb] [BEAM-1124] Temporarily Ignore a ValidatesRunnerTest That Broke

[altay] Do not test pickling native sink objects

[sourabhbajaj] Update the BQ export flat from Json to Avro

[robertwb] Rename PTransform.apply() to PTransform.expand()

[robertwb] Update Apitools to version 0.5.6

[robertwb] Fixing inconsistencies in PipelineOptions

[robertwb] Add support for date partitioned table names

[github] Fix typo in error message

[lcwik] Add experiments alias for DebugOptions

[robertwb] Fixing postcomit error caused by PR 1526.

[robertwb] Allow disabling flattening of records types

[robertwb] Add Hamcrest To Tox For autocomplete_test Execution

[robertwb] [BEAM-1112] Python E2E Test Framework And Wordcount E2E Test

[dhalperi] Use CountingSource in ForceStreamingTest

[dhalperi] Add Parameters to finishSpecifying

[dhalperi] BEAM-1203 Fixed exception when creating zip entries during Apex YARN

[dhalperi] Update python-sdk pom.xml version and fixup to merge errors.

[robertwb] Decreasing the number of copies of things in scope for reduced peak

[robertwb] Instance variable rename.

[robertwb] Using the "del" keyword to more explicitly document the memory

[robertwb] Remove the word 'Pipeline' from the PipelineRunner subclasses.

[robertwb] Add a depracated warning about BlockingDataflowRunner.

[robertwb] Skip the test_memory_usage test on macos.

[robertwb] Fixing staging/temp location comment.

[vikasrk] Assert transform without side inputs

[markliu] Fix Incorrect State Usage in PipelineVerifier Unit Test

[dhalperi] Fixing test_default_job_name case

[dhalperi] A few memory and IO optimizations in Avro and FileIO

[dhalperi] [BEAM-1221] Run Wordcount IT in Postcommit Using Test Framework

[lcwik] [BEAM-1226] Add support for well known coder types to Apache Beam python

[dhalperi] [BEAM-1112] Improve Python E2E Test Framework

[dhalperi] Rename ->

[bchambers] [BEAM-147] Adding Metrics API to Python SDK

[robertwb] Fixed the example usage in

[robertwb] Updates Python SDK examples to use Beam text source/sink.

[robertwb] Updated ptransform.apply() to ptransform.expand() in the comments.

[robertwb] Update README examples to use the new io APIs

[robertwb] Improve performance of fileio._CompressedFile

[robertwb] To use @unittest.skip to skip avroio_test cases when snappy is not

[robertwb] Update to get rid of 'incubating' notion.

[altay] Remove obsolete teardown_policy argument

[altay] Remove the pipeline_type_check option

[robertwb] Create TFRecordIO, which provides source/sink for TFRecords, the

[robertwb] Provided temporary directory management for test cases.

[Pablo] Adding protobuf matchers for dataflow client.

[robertwb] Only the first occurrence of each typehint related warning message.

[robertwb] Compressed file with missing last EOF create a fake element

[chamikara] Updates snippets to use Beam text source and sink.

[robertwb] [BEAM-1188] Python File Verifer For E2E Tests

[robertwb] [BEAM-1188] Use fileio.ChannelFactory instead of TextFileSource

[markliu] Fix test_pipeline_test That Broke PostCommit

[davor] Removing some of the dataflow references.

[robertwb] Implement wait_until_finish method for existing runners.

[robertwb] Make blocking by default.

[robertwb] Changed tests in examples/ and io/ to use TestPipeline.

[robertwb] Update tests to use TestPipeline()

[robertwb] Add dependency comments to tox file.

[robertwb] Moving from a string-based buffer to a cStringIO based on in order to

[robertwb] Fix Incorrect Split in Test Pipeline Test

[robertwb] Metrics test in start/end_bundle for ParDos

[altay] Update tests and examples to use new labels.

[altay] update labels in iobase

[altay] Remove unneeded labels, and convert existing labels to UpperCamelCase.

[robertwb] Add tests for standard beam coder types.

[robertwb] Add a --fix option to the standard coder test that populates 

[robertwb] A couple more examples.

[robertwb] Make fail when the underlying execution fails.

[robertwb] DataflowRunner will raise an exception on failures.

[dhalperi] Clean *.pyc files with mvn clean.

[altay] Update DataflowPipelineResult.state at the end of

[robertwb] Implement Annotation based NewDoFn in python SDK

[robertwb] Add some typing to prevent speed regression for old_dofn.

[chamikara] Increments major version used by Dataflow runner to 5

[robertwb] Remove

[robertwb] Code cleanup now that all runners support windowed side inputs.

[robertwb] Revert "Remove"

[robertwb] Fix case where side inputs may be an iterable rather than a list.

[robertwb] Removes Dataflow native text source and sink from Beam SDK.

[robertwb] Install test dependencies in the post commit script.

[robertwb] Cleanup tests in pipeline_test.

[robertwb] Use a temp directory for requirements cache in

[robertwb] Revert "Revert "Remove""

[robertwb] Fix read/write display data

[robertwb] Refactoring metrics infrastructure

[Pablo] Updating dataflow client protos to add new metrics.

[dhalperi] Run lint on all files in the module.

[altay] Update pom.xml for sdks/python.

[davor] [BEAM-843] Use New DoFn Directly in Flink Runner

[davor] Update the file to match the latest beam version.

[davor] Updates places in SDK that creates thread pools.

[davor] Add mock time to slow bigquery unit tests.

[davor] Revert python-sdk only changes in travis, and clean incubator keywords.

[davor] Remove sdks/python/LICENSE

[davor] Move sdks/python/.gitignore to top-level .gitignore

[davor] Update Python post-commit Jenkins configuration

Reply via email to